BURLINGTON
THE
CARUS MATHEMATICAL MONOGRAPHS
Published by
THE MATHEMATICAL ASSOCIATION OF AMERICA
Publication Committee
GILBERT AMES BLISS
DAVID RAYMOND CURTISS
HERBERT ELLSWORTH SLAUGHT
THE CARUS MATHEMATICAL MONOGRAPHS are an
expression of the desire of Mrs. Mary Hegeler Carus, and of
herson, Dr. Edward H. Carus, to contribute to the dissemina-
tion of mathematical knowledge by niaking accessible at nominal cost
a series of expository presentations of the best thoughts and keenest
researches in pure and applied mathematics. The publication of the
first four of these monographs was made possible by a notable gift to
the Mathematical Association of America by Mrs. Carus as sole
trustee of the Edward C. Hegeler Trust Fund. The sales from these
have resulted in the Carus Monograph Fund, and the Mathematical
Association has used this as a revolving book fund to publish the
fifth and sixth monographs.
The expositions of mathematical subjects which the monographs
contain are set forth in a manner comprehensible not only to teachers
and students specializing in mathematics, but also to scientific
workers in other fields, and especially to the wide circle of thoughtful
people who, having a moderate acquaintance with elementary mathe-
matics, wish to extend their knowledge without prolonged and critical
study of the mathematical journals and treatises. The scope of this
series includes also historical and biographical monographs.
The following books in this series have been published to date:
No. 1. Calculus of Variations, by Gilbert Ames Bliss.
No. 2. Analytic Functions of a Complex Variable, by David Ray-
mond CURTISS
No. 3. Mathematical Statistics, by Henry Lewis Rietz.
No. 4. Projective Geometry, by John Wesley Young.
No. 5. A History of Mathematics in America Before 1900, by David
Eugene Smith and Jekuthiel Ginsburg.
No. 6. Fourier Series and Orthogonal Polynomials by Dunham Jack-
son.
No. 7. Vectors and Matrices, by C. C. MacDuffee.
The Cams Mathematical Monographs
NUMBER THREE
MATHEMATICAL
STATISTICS
HENRY LEWIS RIETZ
Professor of Mathematics ^ The Universiiy oj Iowa
Published for
THE MATHEMATICAL ASSOCIATION OF AMERICA
by
THE OPEN COURT PUBLISHING COMPANY
LA SALLE • ILLINOIS
It
CoPYBiaHT 1927 By
The MATHEMATioAii Association ov Amk&ioa
Published April 1927
Second Printing 1929
Third Printing 1936
Fourth Printing 1943
Fifth Printing\l947 y
Reprinted by
John S. Swift Co., Inc.
CHICAGO
liOUIS CINCINNATI NBW YORK
PREFACE
This book on mathematical statistics is the third of
the series of Cams Mathematical Monographs. The
purpose of the monographs, admirably expressed by
Professor Bliss in the first book of the series, is "to make
the essential features of various mathematical theories
more accessible and attractive to as many persons as
possible who have an interest in mathematics but who
may not be specialists in the particular theory presented. '*
The problem of making statistical theory available
has been changed considerably during the past two or
three years by the appearance of a large number of text-
books on statistical methods. In the course of preparation
of the manuscript of the present volume, the writer felt
at one time that perhaps the recent books had covered
the ground in such a way as to accomplish the main pur-
poses of the monograph which was in process of prepara-
tion. But further consideration gave support to the view
that although the recent books on statistical method will
serve useful purposes in the teaching and standardization
of statistical practice, they have not, in general, gone far
toward exposing the nature of the underlying theory,
and some of them may even give misleading impressions
as to the place and importance of probability theory in
statistical analysis.
It thus appears that an exposition of certain essential
features of the theory involved in statistical analysis
would conform to the purposes of the Cams Mathemati-
cal Monographs, particularly if the exposition could be
vi PREFACE
made interesting to the general mathematical reader.
It is not the intention in the above remarks to imply a
criticism of the books in question. These books serve
certain useful purposes. In them the emphasis has been
very properly placed on the use of devices which facili-
tate the description and analysis of data.
The present monograph will accomplish its main
purpose if it makes a slight contribution toward shifting
the emphasis and point of view in the study of statistics
in the direction of the consideration of the underlying
theory involved in certain highly important methods of
statistical analysis, and if it introduces some of the re-
cent advances in mathematical statistics to a wider range
of readers. With this as our main purpose it is natural
that no great effort is being made to present a well-
balanced discussion of all the many available topics.
This will be fairly obvious from omissions which will be
noted in the following pages. For example, the very im-
portant elementary methods of description and analysis
of data by purely graphic methods and by the use of
various kinds of averages and measures of dispersion are
for the most part omitted owing to the fact that these
methods are so available in recent elementary books that
it seems tuinecessary to deal with them in this mono-
graph. On the other hand, topics which suggest making
the underlying theories more available are emphasized.
For the purpose of reaching a relatively large number
of readers, we are fortunate in that considerable portions
of the present monograph can be read by those who have
relatively little knowledge of college mathematics. How-
ever, the exposition is designed, in general, for readers of
a certain degree of mathematical maturity, and presup-
PREFACE vii
poses an acquaintance with elementary differential and
integral calculus, and with the elementary principles of
probability as presented in various books on college alge-
bra for freshmen.
A brief list of references is given at the end of Chapter
VII. This is not a bibliography but simply includes books
and papers to which attention has been directed in the
course of the text by the use of superscripts.
The author desires to express his special indebtedness
to Professor Burton H. Camp who read critically the
entire manuscript and made many valuable suggestions
that resulted in improvements. The author is also in-
debted to Professor A. R. Crathorne for suggestions on
Chapter I and to Professor E. W. Chittenden for certain
suggestions on Chapters II and III. Lastly, the author
is deeply indebted to Professor Bliss and to Professor
Curtiss of the Publication Committee for important
criticisms and suggestions, many of which were made with
special reference to the purposes of the Carus Mathe-
matical Monographs.
Henry L. Rietz
The University of Iowa
December, 1926
TABLE OF CONTENTS
CBAPTEK PAGE
I. The Nature of the Problems and Underlying
Concepts of Mathematical Statistics .... 1
1. The scope of mathematical statistics
2. Historical remarks
3. Two genera! types of problems
4. Relative frequency and probability
5. Observed and theoretical frequency distributions
6. The arithmetic mean and mathematical expecta-
tion
7. The mode and the most probable value
8. Moments and the mathematical expectations of
powers of a variable
II. Relative Frequencies in Simple Sampling ... 22
9. The binomial description of frequency
10. Mathematical expectation and standard deviation
of the number of successes
11. Theorem of Bernoulli
12. The De Moivre-Laplace theorem
13. The quartile deviation
14. The law ol small probabilities. The Poisson expo-
nential function
III. Frequency Functions of One Variable .... 46
15. Introduction
16. The Pearson system of generalized frequency
curves
17. Generalized normal curves — Gram-Charlier series
18. Remarks on the genesis of Type A and Type B
forms
19. The coefficients of the Type A series expressed in
moments of the observed distribution
20. Remarks on two methods of determining the co-
efficients of the Type A series
X TABLE OF CONTENTS
CHAPTER ' PAGF
21. The coefficients of the Type B series
22. Remarks
23. Skewness
24. Excess
25. Remarks on the distribution of certain transformed
variates
26. Remarks on the use of various frequency functions
as generating functions in a series representation
IV. Correlation 77
27. Meaning of simple correlation
28. The regressive method and the correlation surface
method of describing correlation
29. The correlation coefficient
THE REGRESSION METHOD OF DESCRIPTION
30. Linear regression
31. The standard deviation of arrays — mean square
error of estimate
32. Non-linear regression — the correlation ratio
33. Multiple correlation
34. Partial correlation
35. Non-linear regression in n variables — multiple cor
relation ratio
36. Remarks on the place of probability in the regres-
sion method
THE CORRELATION SURFACE METHOD
OF DESCRIPTION
37. Normal correlation surfaces
38. Certain properties of normally correlated dis-
tributions
39. Remarks on further methods of characterizing
correlation
V. On Random Sampling Fluctuations 1 14
40. Introduction
41. Standard error and correlation of errors in dass
frequencies
TABLE OF CONTENTS XI
CHAPTER P*®^
42. Remarks on the assumptions involved in the deriva-
tion of standard errors
43. Standard error in the arithmetic mean and in a
^th moment coefficient about a fixed point
44. Standard error of the ^th moment fx^ about a mean
45. Remarks on the standard errors of various statis-
tical constants
46. Standard error of the median
47. Standard deviation of the sum of independent
variables
48. Remarks on recent progress with sampling errors of
certain averages obtained from small samples
49. The recent generalizations of the Bienayme-
Tchebycheff criterion
50. Remarks on the sampling fluctuations of an ob-
served frequency distribution from the underlying
theoretical distribution
VI. The Lexis Theory 146
51. Introduction
52. Poisson series
53. Lexis series
54. The Lexis ratio
Vll A Development of the Gram-Charlier Series . . 156
55. Introduction
56. On a development of Type A and Type B from the
law of repeated trials
57. The values of the coefficients of the Type A series
obtained from the biorthogonal property
58. The values of the coefficients of type A series ob-
tained from a least-squares criterion
59. The coefficients of a Type B series
Notes 173
Index 178
CHAPTER I
THE NATURE OF THE PROBLEMS AND UNDER-
LYING CONCEPTS OF MATHEMATICAL
STATISTICS
1. The scope of mathematical statistics. The bounds
of mathematical statistics are not sharply defined. It is
not uncommon to include under mathematical statistics
such topics as interpolation theory, approximate integra-
tion, periodogram analysis, index numbers, actuarial
theory, and various other topics from the calculus of ob-
servations. In fact, it seems that mathematical statistics
in its most extended meaning may be regarded as includ-
ing all the mathematics appUed to the analysis of quanti-
tative data obtained from observation. On the other
hand, a number of mathematicians and statisticians have
implied by their writings a limitation of mathematical
statistics to the consideration of such questions of fre-
quency, probability, averages, mathematical expectation,
and dispersion as are likely to arise in the characterization
and analysis of masses of quantitative data. Borel has
expressed this somewhat restricted point of view in his
statement^ that the general problem of mathematical sta-
tistics is to determine a system of drawings carried out
with urns of fixed composition, in such a way that the
results of a series of drawings lead, with a very high degree
of probability, to a table of values identical with the table
of observed values.
* For footnote references, see pp. 173-77.
2 NATURE OF PROBLEMS AND CONCEPTS
On account of the different views concerning the
boundaries of the field of mathematical statistics there
arose early in the preparation of this monograph ques-
tions of some difficulty in the selection of topics to be in-
cluded. Although no attempt will be made here to answer
the question as to the appropriate boundaries of the field
for all purposes, nevertheless it will be convenient, partly
because of limitations of space, to adopt a somewhat re-
stricted view with respect to the topics to be included. To
be more specific, the exposition of mathematical statistics
here given will be limited to certain methods and theories
which, in their inception, center around the names
of Bernoulli, De Moivre, Laplace, Lexis, Tchebycheff,
Gram, Pearson, Edgeworth, and Charlier, and which have
been much developed by other contributors. These meth-
ods and theories are much concerned with such concepts
as frequency, probability, averages, mathematical expec-
tation, dispersion, and correlation.
2. Historical remarks. While we are currently experi-
encing a period of special activity in mathematical statis-
tics which dates back only about forty years, some of the
concepts of mathematical statistics are by no means of
recent origin. The word "statistics" is itself a compara-
tively new word as shown by the fact that its first occur-
rence in English thus far noted seems to have been in J. F.
von Bielfeld, The Elements of Universal Eruditiony trans-
lated byW. Hooper, London, 1770. Notwithstanding the
comparatively recent introduction of the word, certain
fundamental concepts of mathematical statistics to which
attention is directed in this monograph date back to the
first publication relating to Bernoulli's theorem in 1713.
The line of development started by Bernoulli was carried
HISTORICAL REMARKS 3
forward by Stirling (1730), De Moivre (1733), Euler
(1738), and Maclaurin (1742), and culminated in the
formulation of the probability theory of Laplace. The
Theorie Analytique des Probabilites of Laplace published
in 1812 is the most significant pubKcation underlying
mathematical statistics. For a period of approximately
fifty years following the publication of this monumental
work there was relatively little of importance contributed
to the subject. While we should not overlook Poisson's
extension of the Bernoulli theory to cases where the prob-
ability is not constant, Gauss's development of methods
for the adjustment of observations, Bravais's extension of
the normal law to functions of two and three variables,
Quetelet's activities as a popularizer of social statistics,
nevertheless there was on the whole in this period of fifty
years little progress.
The lack of progress in this period may be attributed
to at least three factors: (1) Laplace left many of his re-
sults in the form of approximations that would not readi-
ly form the basis for further development; (2) the follow-
ers of Gauss retarded progress in the generalization of fre-
quency theory by overpromoting the idea that deviations
from the normal law of frequency are due to lack of data;
(3) Quetelet overpopularized the idea of the stability of
certain striking forms of social statistics, for example, the
stability of the number of suicides per year, with the
natural result that his activities cast upon statistics a
suspicion of quackery which exists even to some extent
at present.
An important step in advance was taken in 1877 in
the publication of the contributions of Lexis to the classi-
fication of statistical distributions with respect to normal,
4 NATURE OF PROBLEMS AND CONCEPTS
supemonnal, and subnormal dispersion. This theory will
receive attention in the present monograph.
The development of generalized frequency curves and
the contributions to a theory of correlation from 1885 to
1900 started the period of activity in mathematical statis-
tics in which we find ourselves at present. The present
monograph deals largely with the progress in this period,
and with the earlier underlying theory which facilitated
relatively recent progress.
3. Two general types of problems. For purposes of
description it seems convenient to recognize two general
classes of problems with which we are concerned in mathe-
matical statistics. In the problems of the first class our
concern is largely with the characterization of a set of
numerical measurements. or estimates of some attribute
or attributes of a given set of individuals. For example,
we may establish the facts about the heights of 1,000
men by finding averages, measures of dispersion, and
various statistical indexes. Our problem may be limited
to a characterization of the heights of these 1,000
men.
In the problems of the second class we regard the data
obtained from observation and measurement as a random
sample drawn from a well-defiLQed class of items which
may include either a limited or an imlimited supply. Such
a well-defined class of items may be called the "popula-
tion" or universe of discourse. We are in this case con-
cerned with using the properties of a random sample of
variates for the purpose of drawing inferences about the
larger population from which the sample was drawn. For
example, in this class of problems involving the heights
of the 1,000 men we would be concerned with the ques-
TWO GENERAL TYPES OF PROBLEMS 5
tion: What approximate or probable inferences may be
drawn about the statures of a whole race of men from an
analysis of the heights of a sample of 1,000 men drawn at
random from the men of the race? In dealing with such
questions, we should in the first place consider the diffi-
culties involved in drawing a sample that is truly random,
and in the next place the problem of developing certain
parts of the theory of probability involved in statistical
inference.
The two classes of problems to which we have directed
attention are not, however, entirely distinct with regard
to their treatment. For example, the conceptions of prob-
able and standard error may be used both in describing
the facts about a sample and in indicating the probable
degree of precision of inferences which go beyond the ob-
served sample by dealing with certain properties of the
population from which we conceive the sample to be
drawn. Moreover, a satisfactory description of a sample
is not likely to be so purely descriptive as wholly to pre-
vent the mind from dwelling on the inner meaning of the
facts in relation to the population from which the sample
is drawn.
As a preliminary to dealing in later chapters with cer-
tain of the problems falling under these two general classes
we shall attempt in the present chapter to discuss briefly
the nature of certain underlying concepts. We shall find
it convenient to consider these concepts in pairs as fol-
lows: relative frequency and probability; observed and
theoretical frequency distributions; arithmetic mean and
mathematical expectation; mode and most probable val-
ue; moments and mathematical expectations of a power
of a variable.
6 NATURE OF PROBLEMS AND CONCEPTS
4. Relative frequency and probability. The frequency
/ of the occurrence of a character or event among 5 possi-
ble occurrences is one of the simplest items of statistical
information. For example, any one of the following items
illustrates such statistical information: Five deaths in a
year among 1,000 persons aged 30, nearest birthday; 610
boys among the last 1,200 children born in a city; 400
married men out of a total of 1,000 men of age 23; twelve
cases of 7 heads in throwing 7 coins 1,536 times.
The determination of the numerical values of the rela-
tive frequencies//^ corresponding to such items is one of
the simplest problems of statistics. This simple problem
suggests a fundamental problem concerning the probable
or expected values of such relative frequencies if s were
a very large number. When s is a' large number, the rela-
tive frequency f/s is very commonly accepted in applied
statistics as an approximate measure of the probability
of occurrence of the event or character on a given occa-
sion.
To take an illustration from an important statistical
problem, let us assume that among / persons equally likely
to live a year we find d observed deaths during the year.
That is, we assume that d represents the frequency of
deaths per year among the / persons each exposed for one
year to the hazards of death. If / is fairly large, the rela-
tive frequency d/l is often regarded as an approximation
to what is to be defined as the probability of death of one
such person within a year. In fact, it is a fundamental
assumption of actuarial science that we may regard such
a relative frequency as an approximation to the proba-
bility of death when a sufficiently large number of persons
are exposed to the hazards of death. For a numerical illus
RELATIVE FREQUENCY AND PROBABILITY 7
tration, suppose there are 600 deaths among 100,000 per-
sons exposed for a year at age 30. We accept .006 as an
approximation to the probability in question at age 30.
In the method of finding such an approximation we decide
on a population which constitutes an appropriate class
for investigation and in which individuals satisfy certain
conditions as to likeness. Then we depend on observation
to obtain the items which lead to the relative frequency
which we may regard as an approximation to the proba-
bility.
For an ideal population, let us conceive an urn con-
taining white and black balls alike except as to color and
thoroughly mixed. Suppose further for the present that
we do not know the ratio of the number of white balls to
the total number in this urn which we may conceive to
contain either any finite number or an indefinitely large
number of balls. This ratio is often called the probability
of drawing a white ball. When the number in the urn is
finite, we make drawings at random consisting of j balls
taken one at a time with replacements to keep the ratio
of the numbers of white and black balls constant. If we
may assume the number in the urn to be infinite, the
drawings ma)^ under certain conditions be made without
replacements. Suppose we obtain/ white balls as a result
of thus drawing 5 balls, then we say that//5 is the relative
frequency with which we drew white balls. When s is
large, this relative frequency would ordinarily give us an
approximate value of the probability of drawing a white
ball in one trial, that is, an approximate value of the
ratio of white balls to the total number of balls in the
urn.
Thus far we have not defined probability, but have
8 NATURE OF PROBLEMS AND CONCEPTS
presented illustrations of approximations to probabilities.
While these illustrations seem to suggest a definition, it
is nevertheless difficult to frame a definition that is satis-
factory and includes all forms of probability. The need
for the concepts of relative frequency and probability in
statistics arises when we are associating two events such
that the first may be regarded as a trial and the second
may be regarded as a success or a failure depending on
the result of the trial. The relative frequency of success
is then the ratio of the number of successes to the total
number of trials.
// the relative frequency of success approaches a limit
when the trial is repeated indefinitely under the same set of
circumstances y this limit is called the probability of success
in one trial.
There are some objections to this definition of proba-
bility as well as to any other that we could propose. One
objection is concerned with questioning the validity of
the assumption that a limit of the relative frequency
exists, and another relates to the meaning of the expres-
sion, "the same set of circumstances." That the limit
exists is an empirical assumption whose validity cannot
be proved, but experience with data in many fields has
given much support to the reasonableness and usefulness
of the assumption. The objection based on the difficulty
of controlling conditions so as to repeat the trial under the
same set of circumstances is an objection that could be
brought against experimental science in general with re-
spect to the difficulties of repeating experiments under
the same circumstances. The experiments are repeated as
nearly as circumstance permits.
It seems fairly obvious that the development of sta-
MEANING OF PROBABILITY 9
tistical concepts is approached more naturally from this
limit definition than from the familiar definitions suggest-
ed by games of chance. However, we shall at certain
points in our treatment (for example, see § 11) give
attention to the fact that various definitions of proba-
bility exist in which the assumptions dififer from those
involved in the above definition. The meaning of proba-
bility in statistics is fairly well expressed for some
purposes by any one of the expressions, theoretical
relative frequency, presumptive relative frequency, or ex-
pected value of a relative frequency. Indeed, we some-
times express the fact that the relative frequency f/s
is assumed to have the probabiKty /> as a limit when
5->oo in abbreviated form by writing E{f/s)—py where
E{f/s) is read, "expected value oif/sJ* It is fairly clear
that in our definition of probability we simply ideal-
ize actual experience by assuming the existence of a limit
of the relative frequency. This idealization, for purposes
of definition, is in some respects analogous to the ideal-
ization of the chalk mark into the straight line of
geometry.
In certain cases, notably in games of chance or urn
schemata, the probability may be obtained without col-
lecting statistical data on frequencies. Such cases arise
when we have urn schemata of which we know the ratio
of the number of white balls to the total number. For
example, suppose an urn contains 7 white and 3 black
balls and that we are to inquire into the probability that
a ball to be drawn will be white. We could experiment by
drawing one ball at a time with replacements until we
had made a very large number of drawings and then esti-
mate the probability from the ratio of the number of
lO NATURE OF PROBLEMS AND CONCEPTS
white balls to the total number of balls drawn. It would
however in this case ordinarily be much more convenient
and satisfying to examine the balls to note that they are
alike except as to color and then make certain assump-
tions that would give us the probability without actually
making the trials.
Thus, when all the possible ways of drawing the balls
one at a time may be analyzed into 10 equally likely ways,
and when 7 of these 10 ways give white balls, we assume
that 7/10 is the probability that the ball to be drawn in
one trial will be white. This simple case illustrates the
following process of arriving at a probability:
// all of an aggregate of ways of obtaining successes and
failures can be analyzed into s^ possible mutually exclusive
ways each of which is equally likely; and iff of these ways
give successes, the probability of a success in a single trial
may be taken to be f js' ,
Thus in throwing a single die, what is the probability
of obtaining an ace? We assume that there are 6 equally
likely ways in which the die may fall. One of these ways
gives an ace. Hence, we say 1/6 is the probability of
throwing an ace. A probability whose value is thus ob-
tained from an analysis of ways of occurrence into sets of
equally likely cases and a segregation of the cases in which
a success would occur is sometimes called an a priori
probability, while a probability whose approximate value
is obtained from actual statistical data on repeated trials
is called an a posteriori or statistical probability.
In making an analysis to study probabilities, difficult
questions arise both as to the meaning and fulfilment of
the condition that the ways are to be "equally likely."
These questions have been the subject of lively debates
MEANING OF PROBABILITY ii
by mathematicians and philosophers since the time of
Laplace. It has been fairly obvious that the expression
''equally likely ways" implies as a necessary condition
that we have no information leading us to expect the
event to occur in one of two ways rather than in the other,
but serious doubt very naturally arises as to the suffi-
ciency of this condition. In fact, it is fairly clear that lack
of information is not sufficient. For example, lack of in-
formation as to whether a spinning coin is symmetrical
and homogeneous does not assist one in passing on the
validity of the assumption that it is equally likely to turn
up head or tail. It is when we have all available relevant
information on such matters as symmetry and homoge-
neity that we have a basis for the inference that the two
ways are equally likely, or not equally likely. Similarly,
lack of information about two large groups of men of age
30 would not assist us in making the inference that the
mortality rates or probabilities of death are approximate-
ly equal for the two groups. On the other hand, relevant
information in regard to the results of recent medical
examinations, occupations, habits, and family histories
would give support to certain inferences or assumptions
concerning the equality or inequality of the mortality
rates for the two groups.
5. Observed and theoretical frequency distributions.
In many statistical investigations, it is convenient to par-
tition the whole group of observations into subgroups or
classes so as to show the number or frequency of observa-
tions in each class. Such an exhibit of observations is
called an "observed frequency distribution." As illustra-
tions we present the following, where the rows marked F
are the observed frequency distributions:
12 NATURE OF PROBLEMS AND CONCEPTS
Example i. A= lengths of ears of corn in inches.
A.... 3 4.5 6.0 7.5 9.0 10.5 12.0
P.... 1 3 20 63 170 67 3
Example 2. A = prices of commodities for 1919 rela-
tive to price of 1913 as a base.
^. .62 87 112 137 162 187 212 237 262 287 312 337 362 387 412 437 462
F.. \ 1 5 16 39 66 61 36 38 24 9 3 3 3 1
Example j. A= heights of men in inches.
^.... 61 62 63 64 65 66 67 68 69 70 71 72 73 74
F.... 2 10 11 38 57 93 106 126 109 87 75 23 9 4
In Example 1 the whole group of ears of corn is ar-
ranged in classes with respect to length of ears. The class
interval in this case is taken to be one and one-half inches.
In Example 2 the class interval is a number, twenty-five;
in Example 3, it is one inch.
If the variable x takes values Xi, X2j . . . . , Xn with
the corresponding probabilities />i, />2, • • . . , pn, we call
the system of values Xi, X2, . . . , , Xn and the associated
probabilities or numbers proportional to them, the theo-
retical frequency distribution of the variable x. Thus, we
may write for the theoretical frequency distribution of
the number of heads in throwing three coins:
Heads 12 3
Probabilities 1/8 3/8 3/8 1/8
Theoretical frequencies. . 13 3 1
When for a given set of values of a variable x there exists
a function F(x) such that the ratio of the number of values
of X on the interval ab to the number on the interval a^b'
is the ratio of the integrals
CFix)dx : j
F(x)dx ,
FREQUENCY FUNCTION
13
for all choices of the intervals ab and a'b\ then F(x) is
called the frequency function y or the probability density, or
the law of distribution of the values of x. The curve
y = F(x) is called a theoretical frequency curve, or more
briefly the frequency curve.
To devise methods for the description and characteri-
zation of the various types of frequency distributions
which occur in practical problems of statistics is clearly
/
V,.
1'20
no
/
N
/
\S
90
i
\
I
1
jeo
1
\
il
\\
340
630
20
10
\
i
1
\
/
\
/
f
\
^
^
%
L
61 fi3 63 (H 65 M S7 68 (>9 70 71 7i 73 74 75
Fig. 1 Showing frequency polygon and free-hand fire-
qaency curve of the dibtributioa of heights of men la
Example 3.
of fundamental importance. Such a description or charac-
terization may be effected with various degrees of refine-
ment ranging all the way from one extreme with a simple
frequency polygon or freehand curve (Fig. 1) representing
frequencies by ordinates, to a description at the other
extreme by means of a theoretical frequency curve
grounded in the theory of probability.
It is fairly obvious that the latter type of description
is likely to be much more satisfactory than the former
because a deeper meaning is surely given to an observed
distribution if we can effectively describe it by means of
14 NATURE OF PROBLEMS AND CONCEPTS
a theoretical frequency curve than if we can give only a
freehand or an empirical curve as the approximate repre-
sentation. However, we should not overlook the fact that
the description by means of a theoretical curve may be
too ponderous and laborious for the particular purpose
of an analysis. Indeed, the use of the theoretical curve is
likely to be justified in a large way only when it facilitates
the study of the properties of the class of distributions of
which the given one is a random sample by enabling us
to make use of the properties of a mathematical function
F{x) in establishing certain theoretical norms for the de-
scription of a class of actual distributions. As important
supplements to the purely graphic method, we may de-
scribe the frequency distribution by the use of averages,
measures of dispersion, skewness, and peakedness. Such
descriptions facilitate the comparison of one distribution
with another with respect to certain features.
6. The arithmetic mean and mathematical expecta-
tion. The arithmetic mean (AM) of n numbers is simply
the sum of the numbers divided by w. That is, the arith-
metic mean of the numbers
^1, X2f . . . . , Xn
is given by the formula
Xi-\-X2-{' • • • • -VXn
(1) AM =
The AM is thus what is usually meant by the terms
''mean," "average," or "mean value" when used without
further qualification. If the values Xi^ X2, . . , . , Xn occur
with corresponding frequencies /i, /2, . . . . , /», respec-
ARITHMETIC MEAN 15
tively, where /1+/2+ . . . . + fn = s, then it follows
from (1) that the arithmetic mean is given by
(2) AM =
flXl-\-f2X2-{- ' ' • ' +fnXf^
(3)
where
=-^rci+4^2+----+4^n,
/lA+/2A+-.--+/nA=l
The arithmetic mean given by (2) is sometimes called
a "weighted arithmetic mean" where /i,/2, ....,/„ are
the weights of the values Xi, x^, . , . . j Xn, respectively,
and (3) may similarly be regarded as a weighted arith-
metic mean, where
fl/Sy h/sy , fn/s
are the weights of Xi, 0:2, .... , Xn, respectively.
For our present purpose it is important to note that
the coefficients of Xi, X2, . . . . , Xn'm (3) are the rela-
tive frequencies of occurrence of these values. By defini-
tion of statistical probabilities, the limiting value of ft/s
as s increases indefinitely is pt, where pt is the assumed
probability of the occurrence of a value Xt ai^ong a set
of mutually exclusive values Xi^ 0^2, .... , Xn. Hence, as
the number of cases considered becomes infinite, the arith-
metic mean would approach a value given by
(4) AM-=-pxXx-\-p2X2-\r • • • • -^-pnXn ,
i6 NATURE OF PROBLEMS AND CONCEPTS
where the probabilities />i, p2, » » * - y pn may be regard-
ed as the weights of the corresponding values
The mathematical expectation of the experimenter or
the expected value of the variable is a concept that has
been much used by various continental European writers
on mathematical statistics. Suppose we consider the
probabilities Pi, pi^ . . . . , pn of w mutually exclusive
events £i, jEj, . . . , -£„, so that /'iH-/'2+ • • • • +/>« = !.
Suppose that the occurrence of one of these, say £< , on
a given occasion yields a value a;« of a variable x. Then
the mathematical expectation or expected value E{x) of
the variable x which takes on values Xi, x^j , . , , Xn
with the probabilities pi^ /'2, . . . . , pn, respectively,
may be defined as
(5) E{x)^piXi-\-p2Xi'\ • -\-pnXn •
We thus note by a comparison of (4) and (5) the identity
of the limit of the mean value and the mathematical ex-
pectation.
Furthermore, in dealing with a theoretical distribution
in which pt is the probability that a variable x assumes a
value Xt among the possible mutually exclusive values
xi, X2y , , , , , Xn, and pi+p2-\- • • • • +/>» = !, we have
(6) AM = piXi + p2X2-\' • • • • -\-pnXn •
That is, the mathematical expectation of a yariable x and
its mean value from the appropriate theoretical distribu-
tion are identical. While there are probably differences
of opinion as to the relative merits of the language involv-
ing mathematical expectation or expected value in com-
MODE AND MOST PROBABLE VALUE 17
parison with the language which uses the mean value of
a theoretical distribution, or mean value as the number
of cases becomes infinite, the language of expectation
seems the more elegant in many theoretical discussions.
For the discussions in the present monograph we shall
employ both of these types of language.
7. The mode and the most probable value. The mode
or modal value of a variable is that value which occurs
most frequently (that is, is most fashionable) if such a
value exists.
Rough approximations to the mode are used consider-
ably in general discourse. To illustrate, the meaning of
the term "average" as frequently used in the newspapers
in speaking of the average man seems to be a sort of crude
approximation to the mode. That is, the term "average'*
in this connection usually implies a type which occurs
oftener than any other single type.
The mode presents one of the most striking character-
istics of a frequency distribution. For example, consider
the frequency distribution of ears of corn with respect
to rows of kernels on ears as given in following table:
A 10 12 14 16 18 20 22 24
F 1 16 109 241 235 116 41 10.
where A = number of rows of kernels and F = frequency.
It may be noted that the frequency increases up to the
class with 16 rows and then decreases. The mode in rela-
tion to a frequency distribution is a value to which there
corresponds a greater frequency than to values just pre-
ceding or immediately following it in the arrangement.
That is, the mode is the value of the variable for which
the frequency is a maximum. A distribution may have
more than one maximum, but the most common types of
1 8 NATURE OF PROBLEMS AND CONCEPTS
frequency distributions of both theoretical and practical
interest in statistics will be found to have only one mode.
The expression "most probable value" of the number
of successes in s trials is used in the general theory of
probability for the number to which corresponds a larger
probability of occurrence than to any other single number
which can be named. For example, in throwing 100 coins,
the most probable number of heads is 50, because 50 is
more likely than any other single number.
This does not mean, however, that the probability of
throwing exactly 50 heads is large. In fact, it is small, but
nevertheless greater than the probability of throwing 49
or any other single number of heads. In other words, the
most probable value is the modal value of the appropriate
theoretical distribution.
8. Moments and the mathematical expectations of
powers of a variable. With observed frequencies /i, /2,
....,/„ corresponding to Xi, ^2, . . . . , Xn, respectively,
and with /1+/2+ • • • • +fn = s, the ^th order moment , per
unit frequency J is defined as
t=n
(7) Mi=-\^M,
which is the arithmetic mean of the ^th powers of the
variates. For the sake of brevity, we shall ordinarily use
the word "moment" as an abbreviation for "mcment per
unit frequency," when this usage will lead to no misunder-
standing of the meaning.
Consider a theoretical distribution of a variable x tak-
ing values Xt{t = \, 2, . . . . , n). Let the corresponding
probabilitiesof occurrence /><(/ = !, 2, . . . . , «) be repre-
MOMENTS AND MATHEMATICAL EXPECTATION 19
sented as ^'-ordinates. Then the moment of order k of the
ordinates about the ^^-axis is defined as
t=n
(8) Mi=2)Mf.
t=l
The mathematical expectation of the ^th power of x is
likewise defined as the second member of this equality so
that the y^th moment of the theoretical distribution and
the mathematical expectation of the ^th power of the
variable x are identical.
When we have a theoretical distribution ranging from
x = a to x = h, and given by a frequency function (p. 13)
y = F(x)j we write in place of (8)
M*= Cx'F{x)dx,
H
where F{x)dx gives, to within infinitesimals of higher or-
der, the probability that a value of x taken at random falls
in any assigned interval x to x-^dx.
When the axis of moments is parallel to the y-axis
and passes through the arithmetic mean or centroid x of
the variable x, the primes will be dropped from the /i's
which denote the moments. Thus, we write
(9) i^k=\^ft{x,-xy^\'Y,a^t-i^i)
t=\
where the arithmetic mean of the values of x is x = ix[.
The square root of the second moment M2 about the
arithmetic mean is called the standard deviation and is
20 NATURE OF PROBLEMS AND CONCEPTS
very commonly denoted by <7. That is, the standard de-
viation is the root-mean-square of the deviations of a set
of numbers from their arithmetic mean. In the language
of mechanics, o- is the radius of gyration of a set of 5 equal
particles, with respect to a given centroidal axis.
It is often important to be able to compute the mo-
ments about the axis through the centroid from those
about an arbitrary parallel axis. For this purpose the fol-
lowing relations are easily established by expanding the
binomial in (9) and then making some slight simplifica-
tions:
/xo = Mo=l, Mi = 0, M2 = M2 — M?,
M3 = M3 — 3iUiM2+2Mi^ ,
M4 = M4-4/xi/x5+6MlV2-3/iiS
» = ^ ^
where
\i) iHn-t
)!
is the number of combinations of n things taken i at a
time.
These relations are very useful in certain problems
of practical statistics because the moments ju* (^ = 1, 2,
. . . .) are ordinarily computed first about an axis con-
veniently chosen, and then the moments ju* about the
MOMENTS AND MATHEMATICAL EXPECTATION 2 1
parallel centroidal axis may be found by means of the
above relations. In particular, tii — li — lA^ expresses the
very important relation that the second moment ni about
the arithmetic mean is equal to the second moment jiia
about an arbitrary origin diminished by the square /xi^
of the arithmetic mean measured from the arbitrary
origin. This is a familiar proposition of elementary me-
chanics when the mean is replaced by the centroid.
When we-pass from (9) to corresponding expectations,
the relation M2 = M2 — Mi^, written in the form /4 = Mi^+M2,
tells us that the expected value, E{x^), of x^ is equal to
the square, [£(rt:)]^ of the expected value of x increased
by the expected value, E\^x — E{x)Y\, of the square of the
deviations of x from its expected value.
CHAPTER II
RELATIVE FREQUENCIES IN SIMPLE SAMPLING
9. The binomial description of frequency. In Chap-
ter I attention was directed to the very simple process of
finding the relative frequency of occurrence of an event
or character.among s cases in question. Let us now con-
ceive of repeating the process of finding relative fre-
quencies on many random samples each consisting of s
items drawn from the same population. To characterize
the degree of stability or the degree of dispersion of such
a series of relative frequencies is a fundamental statistical
problem.
To illustrate, suppose we repeat the throwing of a set
of 1,000 coins many times. An observed frequency dis-
tribution could then be exhibited with respect to the
number of heads obtained in each set of 1,000, or with
respect to the relative frequency of heads in sets of 1,000.
Such a procedure would be a laborious experimental treat-
ment of the problem of the distribution of relative fre-
quencies from repeated trials. What we seek is a mathe-
matical method of obtaining the theoretical frequency dis-
tribution with respect to the number of heads or with
respect to the relative frequency of heads in the sets.
To consider a more general problem, suppose we draw
many sets of 5 balls from an urn one at a time with re-
placements, and let p be the probability of success in
drawing a white ball in one trial. The problem we set is
to determine the theoretical frequency distribution with
22
BINOMIAL DESCRIPTION OF FREQUENCY 23
respect to the number of white balls per set of s, or with
respect to the relative frequency of white balls in the sets.
To consider this problem, let q be the probability of
failure to draw a white ball in one trial so that p+q = \.
Then the probabilities of exactly w = 0, 1, 2, . . . . , 5
successes in s trials are given by the successive terms of
the binomial expansion
(1)
where
{qi-py = qs^spq^-'+ (J^ p'q^-^
(s\/ s \^_A
\m) \s-mj fnl(s-
m)r
Derivations of this formula for the probability of m
successes in s trials from certain definitions of probability
are given in books on college algebra for freshmen. For a
derivation starting from the definition of probability as a
limit, the reader is referred to Coolidge.^ A frequency
distribution with class frequencies proportional to the
terms of (1) is sometimes called a Bernoulli distribution.
Such a theoretical distribution shows not only the most
probable distribution of the drawings from an urn, as
described above, but it serves also as a norm for the dis-
tribution of relative frequencies obtained from some of
the simplest sampling operations in applied statistics. For
example, the geneticist may regard the Bernoulli dis-
tribution (1) as the theoretical distribution of the rela-
tive frequencies m/s of green peas which he would obtain
* See references on pp. 173-77.
24 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
among random samples each consisting of a yield of s
peas. The biologist may regard (1) as the theoretical dis-
tribution of the relative frequencies of male births in
random samples of s births. The actuary may regard (1)
as the theoretical distribution of yearly death-rates in
samples of s men of equal ages, say of age 30, drawn from
a carefully described class of men. In this case we
specify that the samples
shall be taken from a care-
fully described class of
men because the assump-
tions involved in the urn
schemata underlying a
Bernoulli distribution do
not permit a careless se-
lection of data. Thus, it
would not be in accord
with the assumptions to take some of the samples from
a group of teachers with a relatively low rate of mortality
and others from a group of anthracite coal miners with
a relatively high rate of mortality.
The fact stated at the beginning of this section that
we are concerned with repeating the process of drawing
from the same population is intended to imply that the
same set of circumstances essential to drawing a random
sample shall exist throughout the whole series of draw-
ings.
The expression "simple sampling'' is sometimes ap-
plied to drawing a random sample when the conditions
for repetition just described are fulfilled. In other words,
simple sampling implies that we may assume the imder-
lying probability p of formula (1) remains constant from
MOST PROBABLE VALUE 25
sample to sample, and that the drawings are mutually
independent in the sense that the results of drawings do
not depend in any significant manner on what has hap-
pened in previous drawings.
In Figure 2 the ordinates at a: = 0, 1, 2, . . . . ,7 show
the values of terms of (1) for /> = g = l/2, 5 = 7. To find
the "most probable" or modal number of successes
m' in s trials, we seek the value oi m = m' which gives a
maximum term of (1). To find this value of m, we write
the ratios of the general term of (1) to the preceding and
the succeeding terms. The first ratio will be equal to or
greater than imity when
s-m±\p ^^^
m q r I r
In the same way, the second ratio will be equal to or
greater than unity when
T^l or m>P5 — q,
s—m p ~^ ^
We have, thus, the integer m=rn/ which gives the modal
value determined by the inequalities,
ps—q^m'^ps+p .
We may say therefore that, neglecting a proper fraction,
ps is the most probable or modal number of successes.
When ps — q and ps-\-p are integers, there occur two equal
terms in (1) each of which is larger than any other term
of the series. For example, note the equality of the first
and second terms of the expansion (5/6+1/6)^.
26 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
10. Mathematical expectation and standard deviation
of the number of successes. Let fh be the mathematical
expectation of the number of successes, or what is the
same thing, the arithmetic mean number of successes in
s trials under the law of repeated trials as defined by
formula (1) on page 23. We shall now prove that fn = ps.
By definition (§6),
(2)
_ "ST^ si
{s-m)
m =
S\
(3) =spy 7 \^ ^.p^-'q'-^^sp
smce
f«=l ^ ^
Let d = m — sp be the discrepancy of the number of
successes from the mathematical expectation, and let a^
be the mathematical expectation of the square of the dis-
crepancy. By definition,
(r2 = V -—j^ — - p^ q'-^{m - spy
m =
(*) ^ 2 ,;dFr^^'" «'""■('"' -2'«^^+''^>-
„ {s-m)
m =
DISPERSION OF NUMBER OF SUCCESSES 27
We shall now prove that a^ = spq. To do this, we
write w2 = w+w(w — 1) and obtain for the first term of
(4) the value
(5)
'^nd is
m
pmqs.
ml(s — m)\
= sp+s{s-l)f .
From (2), (3), (4), and (5), we have
(6)
f^2^Sp + sis-l)p^-25^p^ + sY
= sp{l-p)=spq .
The measure of dispersion a is often called the
standard deviation of the frequency of successes in the
population.
Next, we define d/s = {m/s)—p as the relative dis-
crepancy, for it is the difference between the probability of
success and the relative frequency of success. The mean
square of the relative discrepancy is the second member
of equation (4) divided by s^. It is clearly equal to the
mean square a^ of the discrepancy divided by s^, which
gives
(7) ?.
The theoretical value of the standard deviation of the
relative frequency of successes is then (pq/sY^^.
11. Theorem of Bernoulli. The theorem of Bernoulli
deals with the fundamental problem of the approach of
the relative frequency m/s of success in s trials to the
28 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
underlying constant probability p sls s increases. The
theorem may be stated as follows:
In a set of s trials in which the chance of a success in
each trial is a constant p, the pr oh ability P of the relative
discrepancy {m/s)—p being numerically as large as any as-
signed positive number e "will approach zero as 'a limit as the
number of trials s increases indefinitely ^ and the probability ,
Q = l—F, of this relative discrepancy being less than e ap-
proaches 1 or certainty.
This theorem is sometimes called the law of large
numbers. The theorem has been very commonly regarded
as the basic theorem of mathematical statistics. But with
the definition of probability (p. 8) as the limit of the rela-
tive frequency, this theorem is an immediate consequence
of the definition. While it adds to the definition some-
thing about the manner of approach to the limit the
theorem is in some respects not so strong as the corre-
sponding assumption in the definition.
With a definition of probability other than the limit
definition, the theorem may not follow so readily. It has
been regarded as fundamental because of its bearing on
the use of the relative frequency m/s {s large) as if it
were a close approximation to the probability p. Assum-
ing for the present that we have any definition of the
probability p of success in one trial from, which we reach
the law of repeated trials given in the binomial expansion
(1), we may prove the Bernoulli theorem by the use of
the Bienayme-Tchebycheff criterion.
To derive this criterion, consider a statistical variable
X which takes mutually exclusive values Xij x^j . . . . ^x^
with probabilities pu P2, - * - - y pn, respectively, where
pi-\-p2+ +/>,= !.
THEOREM OF BERNOULLI 29
Let a be any given number from which we wish to
measure deviations of the a;'s. A specially important case
is that in which a is a mean or expected value of x, al-
though a need not be thus restricted. For the expected
mean-square deviation from a, we may write
cr2 = M+M+ +M^,
where c?, = ic, — a.
Let d\ d'\ . . . . , be those deviations Xi—a which are
at least as large numerically as an assigned multiple
€ = Xa-(X>l) of the root-mean-square deviation <t from
a, and let p\ p" , . . . . , be the corresponding probabili-
ties. Then we have
(8) <r2^/>V'2-f/>"J"2^_ .....
Since d\ d" y , . . . , are each numerically equal to or
greater than Xtr, we have from (8) that
(T^^XVV-f />''+...•).
If we now let P{\(t) be the probability that a value of
X taken at random from the "population" will differ from
a numerically by as much as Xcr, then F{\a)=p^-\-p"
+ ...., and (72^XV2P(X(7). Hence
(9) P(Xcx)^^,.
To illustrate numerically we may take a to be the arith-
metic mean of the re's and say that the probability is not
more than 1/25 that a variate taken at random will
30 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
deviate from the arithmetic mean as much as five times
the standard deviation.
A striking property of the Bienayme-TchebychefT
criterion is its independence of the nature of the distribu-
tion of the given values.
In a slightly different form, we may state that the
probability is greater than 1 - l/X^, that a variate taken
at random will deviate less than X<7 from the mathe-
matical expectation. This theorem is ordinarily known
as the inequality of Tchebycheff,^ but the main ideas
underlying the inequality were also developed by Bie-
nayme.^
We shall now turn our attention more directly to the
theorem of Bernoulli. We seek the probability that the
relative discrepancy {m/s) — /? will be numerically as large
as an assigned positive number e.
We may take e = \{pq/sy^^, a multiple of the theoreti-
cal standard deviation (pq/sY^^ of the relative frequencies
m/s. (See §10).
Let P be the probability that
then from the Bienayme-Tchebycheff criterion (9), we
have P^l/X2.
Since
1 i /^n\ 1/2 pq
we have P ^ ^ .
J=!(?)
S€
For any assigned e, we may by increasing s make P
small at pleasure. That is, the probability P that the
relative frequency m/s will differ from the probability p
by at least as much as an assigned number, however small,
DE MOIVRE-LAPLACE THEOREM 31
tends toward zero as the number of cases s is indefinitely
increased.
For example, if we are concerned with the probability
P that I {m/s)-p I ^ .001, we see that P ^ l,000,OOO^gA.
If the number of trials s is not very large, this inequality
would ordinarily put no important restriction on P. But
as s increases indefinitely, l,000,000/>g remains constant,
and l,000,000/>g/5 approaches zero. Again, the probabil-
ity Q = \—P that I {in/s)—p\ is less than e satisfies the
condition
(10) 01-S-
From (10) we see that with any constant pq/^^, the
probability Q becomes arbitrarily near 1 or certainty as
s increases indefinitely. Hence the theorem is established
for any definition of probability from which we derive
(1) as the law of repeated trials.
It seems that the statement of the theorem concern-
ing the probable approach of relative frequencies to the
underlying probabihty may appear simpler and more ele-
gant by the use of the concept of asymptotic certainty
introduced by E. L. Dodd in a recent paper.^ According
to this concept, we may say it is asymptotically certain
that m/s will approach /> as a limit as s increases in-
definitely.
12. The De Moivre-Laplace Theorem.' The De
Moivre-Laplace theorem deals with the probability that
the number of successes w in a set of 5 trials will fall
within a certain conveniently assigned discrepancy d from
the mathematical expectation sp. By the inequality of
Tchebycheff (p. 30) a lower limit to the value of this
probability has been given. We shall now proceed to con-
32 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
sider the problem of finding at least the approximate
value of the probability. This problem would, in the
simplest cases, involve merely the evaluation and addi-
tion of certain terms of the expansion (1). But this pro-
cedure would, in general, be impracticable when s is large
and d even fairly large. To visualize the problem we
represent the terms of (1) by ordinates y, at unit inter-
vals where x marks deviations of m from the mathe-
matical expectation of successes ps as an origin. Then we
have
The probability that the number of successes will lie
within the interval ps—d and ps+d, inclusive of end
values, is then the sum of the ordinates
+<*
(12) y_d+y_(d_i)+ • • • • +yo+yiH- • • • • +yd= 2^^' '
-d
As the number of y's in this sum is likely to be large,
some convenient method of finding the approximate
value of the sum will be found useful. In attacking this
problem, we shall first of all replace the factorials in (11)
approximately by the first term of Stirling's formula for
the representation of large factorials.
This formula^ states that
(13) „. = „»e-"(2«)>/^(l+J^+23|;^+....).
To form an idea of the degree of approximation obtained
by using only the first term of this formula, we may say
that in replacing n\ by n**e~*'{2'irny^^ we obtain a result
■«»+*-l
REDUCTION BY STIRLING'S FORMULA ^3
equal to the true value divided by a number between 1
and l + l/\On. The use of this first term is thus a sufl5-
ciently close approximation for many purposes if n is fair-
ly large. The substitution by the use of Stirling's formula
for factorials in (11) gives, after some algebraic simplifica-
tion,
approximately.
To explain further our conditions of approximation to
(11), we naturally compare any individual discrepancy
X from the mathematical expectation ps with the standard
deviation a^ispqY^^. We should note in this connection
that <T is of order s^^^ if neither p nor q is extremely small.
This fact suggests the propriety of assuming that s is so
large that x/s shall remain negligibly small, but that
x/s'^^^ may take finite values such as interest us most
when we are making comparisons of a discrepancy with
the standard deviation. It is important to bear in mind
that we are for the present dealing with a particular kind
of approximation.
Under the prescribed conditions of approximation, we
shall now examine (14) with a view to obtaining a more
convenient form for y^. For this purpose, we may write
[X X^ 0^ 1
(15)
(16)
34 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
where 4){x) and <t>i(x) are finite because each of them
represents the sum of a convergent power series when x/s
is small at pleasure. From (14), (15), and (16),
2spq s
where <t>z{x) is clearly finite.
Now if s is so large that (x/s) <f)3(x) becomes small, we
have
1 __£L
y'- ilirspqy/^ ' ''''
as an approximation to v, in (11).
As a first approximation to the sum of the ordinates
in (12), we then write the integral
1 r+'^ _j>_ ^
(^7) {2rspgy^'^J., ^"^^^^^-
This integral is commonly known as the probability
integral. The ordinates of the bell-shaped curve (Fig, 3)
represent the values of the function
1 _*L
^"^(2^0^^ ^'^
where (T^ = spq. This curve is the normal frequency curve
and will be further considered in Chapter III.
We may increase slightly the accuracy of our ap-
proximation by taking account of the fact that we have
one more ordinate in (12) than intervals of area. We may
DE MOIVRE-LAPLACE THEOREM
35
therefore appropriately add an ordinate at x = d to the
value given in (17), and obtain
(18)
e 2spq
dx-
1
{lirspq)
1/2 e 2s pq ,
for the probability that the discrepancy is between
— d and d inclusive of end points.
Another method of taking account of the extra
ordinate is to extend the limits of integration in (17) by
one-half the unit at both the upper and lower limits.
That is, we write
(19)
(2
_i n^'^
e 2spq dx
in place of (17).
We may now state the De Moivre-Laplace theorem:
Given a constant probability p of success in each of s
trials where s is a large number, the probability that the
discrepancy m—sp of the number m of successes from the
mathematical expectation will not exceed nufnerically a
given positive number d is given to a first approximation by
(17) and to closer approximations by (18) and (19).
36 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
Although formulas (17), (18), and (19) assume s
large, it is interesting to experiment by applying these
formulas to cases in which s is not large. For example,
consider the problem of tossing six coins. The most prob-
able number of heads is 3, and the probabiHty of a dis-
crepancy equal to or less than 1 is given exactly by
/ 6! 6! 6! \ 1 ^25
V3!3!"^4!2l"*"2!4!/64 32'
which is the sum of the probabilities that the number of
heads will be 2, 3, or 4 for 5 = 6 coins. But spq = l.Sy and
(spqY^^ = 1.225. Then using -3/2 to 3/2 as limits of
integration in (19), we obtain from a table of the probabil-
ity integral the approximate value .779 to compare with
the exact value 25/32 = .781.
For certain purposes, there is an advantage in chang-
ing the variable a: to / in (17) and (18) by the transforma-
tion
- _, _i .
(spqy/^ ' ispqy^'
Then in place of (17) we have
and in place of (18) we have
DE MOIVRE-LAPLACE THEOREM 37
To give a general notion of the magnitude of the
probabilities, we shall now list a few values of P& in (20)
corresponding to assigned values of 5. Thus,
5... .6745 12 3 4
P5... .5 .68269 .95450 .99730 .99994
Extensive tables giving values of the probability
integral and of the ordinates of the probability curve are
readily available. For example, the Glover Tables oj Ap-
plied Mathematics^ give P5/2 for the argument b = x/(T
Sheppard's table^ gives {\-\-Pi)/2 for the argument
5 = x/<T.
We may now state the De Moivre-Laplace theorem
in another form by saying that, the values of P5 in (20)
and (21) give approximations to the probability that
I m — sp I <8{spqy^^ for an assigned positive value of 5.
In still another slightly different form involving rela-
tive frequencies, we may state that the values of Ps in
(20) and (21) give approximations to the probability that
the absolute value of the relative discrepancy satisfies the
inequality
<<?)■'■
(22) : 7
for every assigned positive value of 5.
In order to gain a fuller insight into the significance of
the De Moivre-Laplace theorem we may draw the follow-
ing conclusions from (20) : (a) Assuming as is suggested
by (20) that a 5 exists corresponding to every assigned
probability Pa, we find from d = d{spqy^^ that the bounds
— d to -\-d increase in proportion to s^^^ as 5 is increased
(b) From (20) and (22) it follows that for assigned prob-
38 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
abilities Pa the bounds of discrepancy of the relative
frequency mis from p vary inversely as 5^^^,
To illustrate the use of the De Moivre-Laplace
theorem, we take an example from the third edition of
the American Men of Science by Cattell and Brimhall
(p. 804). A group of scientific men reported 1,705 sons
and 1,527 daughters. The examination of these numbers
brings up the following fundamental questions of simple
sampling. Do these data conform to the hypothesis that
1/2 is the probability that a child to be born will be a
boy? That is, can the deviations be reasonably regarded
as fluctuations in simple sampling under this hypoth-
esis? In another form, what is the probabihty in throw-
ing 3,232 coins that the number of heads will differ
from (3,232/2) = 1,616 by as much, as or more than,
1,705-1,616 = 89?
In this case,
5 = 3,232, (/>^5) 1/2 =28.425,
{pqs)
t/= 1,705- 1,616 = 89, j—r^^ = Z.n\
Referring to a table of the normal probability in-
tegral, we find from (20) that P = .9983. Hence, the prob-
ability that we will obtain a deviation more than 89 on
either side of 1,616 in a single trial is approximately
1-. 9983 = .0017.
13. The quaxtile deviation. The discrepancy d which
corresponds to the probability P = 1/2 in (20) is some-
times called the quartile deviation, or the probable error of
m as an approximation to sp.
By the use of a table of the probability integral, it is
found from (20) that J = .6745 {spqY^^ approximately
POISSON EXPONENTIAL FUNCTION 39
when P=\/2, and thus .61^S{spqy^'^ is the quartile
deviation of the number of successes from the expecta-
tion sp.
14. The law of small probabilities. The Poisson ex-
ponential function. The De Moivre-Laplace theorem does
not ordinarily give a good approximation to the terms of
the binomial {p-{-qy ii p or q is small. If s is large but
sp or sq is small in relation to s, we may give a useful
representation of terms of the binomial expansion (p+qY
by means of the Poisson exponential function. Statistical
examples of this situation are what may be called rare
events and may easily be given : The number born blind
per year in a city of 100,000, or the number dying per
year of a minor disease.
Poisson^° had already as early as 1837 given the func-
tion involved in the treatment of the problem. Bort-
kiewicz" took up the problem in connection with a long
series of observations of events which occur rarely. For
example, one well-known series he gave was the frequency
distribution of the number of men killed per army corps
per year in the Prussian army from the kicks of horses.
The frequency distribution of the number of deaths per
army corps per year was:
Deaths
1
2
3
4
Frequency
109
65
22
3
1
He called the law of frequency involved the "law of small
numbers," and this name continues to be used although
it does not seem very appropriate. The expression ''law of
small probabilities" seems to give a more accurate de-
scription. Assume, then, that the probability p is small
and that q = l-pis nearly unity. That is, p is the prob-
40 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
ability of the occurrence of the rare event in question in a
single trial.
We then seek a convenient expression approximately
equal to
mini ^ '
the probability of m occurrences and n non-occurrences
in m+n = s trials.
Replacing si and n ! by means of Stirling's formula we
obtain
{l — m/sy+*ml
With large values of s and relatively small values of w,
(l — m/sY'^^^^ differs relatively little from (1— m/^)/ and
this in turn differs relatively little from e~"*. Further-
more, ^" = (1—^)" differs very little from e"**^ since, on
the one hand,
and, on the other,
Introducing these approximations by substituting e'
for (1— m/5)*+^/^, and e~"^ for g**, we have
tt
ml
POISSON EXPONENTIAL FUNCTION 41
For rare events, of small probability />, np differs very
little from sp — X. Hence, we write
(23) ^- = ^
for the approximate probability of m occurrences of the
rare event. Then the terms of the series
'-^('+^+^+ii+----)
give the approximate probabilities of exactly 0, 1, 2,
. . . . , occurrences of the rare event in question, and the
sum of series
(24) e-(i+x+|+|+....+^)
gives the probability that the rare event F will happen
either 0, 1, 2, . . . . , or w times in s trials.
Although we have assumed in deriving the Poisson
exponential function X"*e~^/w! that m is small in com-
parison with ^, we may obtain certain simple and inter-
esting results for the mathematical expectation and stand-
ard deviation of the distribution given by the Poisson
exponential when m takes all integral values from w =
to w = 5. Thus, when w = 5 in (24), we clearly have
(25) e-(l+X+|+|+-.-.+^^) = l
approximately if s is large.
Since the successive terms in (25) give approximately
the probabilities of 0, 1, 2, . . . . , 5 occurrences of the
42 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
rare event, the mathematical expectation X^wP«, of the
number of such occurrences is
.-(o+x+xH^;+....+(^,)=
(\2 \5-l \
approximately when s is large.
Similarly, the second moment /iz about the origin is
^=,-x[x+2V+|'+ .... +^] ,
and the second moment about the mathematical expec-
tation is
[•2X2 cX*-l "1
(26)1 =^-i^+^+i;+(S)!]
+X^e-[l+X+|;+....+^]-X^
= X+X^— X^ nearly =sp ,
an approximation to spq since q differs but little from 1.
Tables of the Poisson exponential limit e~^\*/x\ are
given in Tables for Statisticians and Biometricians (pp.
113-24), and in Biometrika, Volume 10 (1914), pages 25-
35. The values of e~'^\*/x\ are tabulated to six places of
decimals for X varying from .1 to 15 by intervals of one-
tenth and for x varying from to 37.
POISSON EXPONENTIAL FUNCTION
43
A general notion of the values of the function for
certain values of X may be obtained from Figure 4 where
the ordinates at 0, 1, 2, .... , show the values of the
function for X = .5, 1, 2, and 5.
Miss Whittaker has prepared special tables {Tables
for Statisticians and Biometricians^ pp. 122-24) which
facilitate the comparison of results from the Poisson ex-
ponential with those from
the De Moivre-Laplace
theory in dealing with the
sampling fluctuations of
small frequencies. The
question naturally arises
as to the value of p below
which we should prefer to
use the Poisson exponen-
tial in dealing with the
probability of a discrep-
ancy less than an assigned
number in place of the results of the De Moivre-Laplace
theory. While there is no exact answer to this question,
there seems to be good reason for certain purposes in
restricting the application of the De Moivre-Laplace
results to cases where the probability is perhaps not less
than ^ = .03.
To illustrate by a concrete situation in which p is
small, consider a case of 6 observed deaths from pneu-
monia in an exposure of 10,000 lives of a well-defined class
aged 30 to 31. It is fairly obvious, on the one hand, thac
the possible variations below 6 are restricted to 6, whereas
there is no corresponding restriction above 6. On the
other hand, if we take (6/10,000) = 3/5,000 as the prob-
y
y
e
x\
,
\
\
lLJ
\\
\
\
y^^
1/
N
\
\
\
\
/
\
\
<^
■<.
■
\
J\
:=
b=
--
Fig. 4
9 10 11 12 X
44 RELATIVE FREQUENCIES IN SIMPLE SAMPLING
ability of death from pneumonia within a year of a per-
son aged 30, it is more likely that we shall experience 5
deaths than 7 deaths among the 10,000 exposed; for the
probability
/ 10,000\ / 4,997 y-^QY 3 y
\ 5 A5,000/ V5,000/
of 5 deaths is greater than the probability
/10,000\ / 4,997 y'QQ3 / 3 y
V 7 /V5,000/ \5,000/
of 7 deaths.
Suppose we now set the problem of finding the prob-
ability that upon repetition with another sample of
10,000, the deviation from 6 deaths on either side will not
exceed 3. The value to three significant figures calcu-
lated from the binomial expa^ision is .854. To use the
De Moivre-Laplace theorem, we simply make J = 3 in
(19), and obtain from tables of probability functions the
value P3 = . 847.
We should thus expect from the De Moivre-Laplace
theorem a discrepancy either in defect more than 3 or in
excess more than 3 in 100—84.7 = 15.3 per cent of the
cases, and from the sum of the binomial terms we should
expect such a discrepancy in 100— 85.4 = 14.6 per cent of
the cases.
Turning next to tables of the Poisson exponential,
page 122 of Tables for Statisticians and BiometricianSy we
find that in 6.197 per cent of cases there will be a dis-
crepancy in defect more than 3 and in 8.392 per cent of
cases there will be a discrepancy in excess more than 3.
APPLICATIONS 45
The sum of 6.197 and 8.392 per cent is 14.589 per cent.
This result differs very little for purposes of dealing with
sampling errors from the 15.3 per cent given by the
De Moivre-Laplace formula, but it is a closer approxima-
tion to the correct value and has the advantage of showing
separately the percentage of cases in excess more than
the assigned amount and the percentage in defect more
than the same amount.
CHAPTER III
FREQUENCY FUNCTIONS OF ONE VARIABLE
15. Introduction. In Chapter I we have discussed very
briefly three different methods of describing frequency
distributions of one variable — the purely graphic method,
the method of averages and measures of dispersion, and
the method of theoretical frequency functions or curves.
The weakness and inadequacy of the purely graphic meth-
od lies in the fact that it fails to give a numerical descrip-
tion of the distribution. While the method of averages
and measures of dispersion gives a numerical description
in the form of a summary characterization which is likely
to be useful for many statistical purposes, particularly for
purposes of comparison, the method is inadequate for
some purposes because (1) it does not give a character-
ization of the distribution in the neighborhood of each
point X or in each small interval xtox-{-dxoi the variable,
(2) it does not give a functional relation between the
values of the variable x and the corresponding frequen-
cies.
To give a description of the distribution at each small
interval x to x+dx and to give a functional relation be-
tween the variable x and the frequency or probability we
require a third method, which may be described as the
"analytical method of describing frequency distribu-
tions." This method uses theoretical frequency functions.
That is, in this method of description we attempt to char-
acterize the given observed frequency distribution by ap-
46
NORMAL FREQUENCY FUNCTION 47
pealing to underlying probabilities, and we seek a fre-
quency function y = F(:«:) such that F{x)dx gives to within
infinitesimals of higher order the probability that a vall-
ate x' taken at random falls in the interval x to x+dx.
Although the great bulk of frequency distributions
which occur so abundantly in practical statistics have cer-
tain important properties in common, nevertheless they
vary sufficiently to present difficult problems in consider-
ing the properties of F(x) which should be regarded as
fundamental in the selection of an appropriate function
to fit a given observed distribution.
The most prominent frequency function of practical
statistics is the normal or so-called Gaussian function
<« ^=^^ ^
where a is the standard deviation (see Fig. 3, p. 35).
Although Gauss made such noteworthy contributions
to error theory by the use of this function that his name
is very commonly attached to the function, and to the
corresponding curve, it is well known that Laplace made
use of the exponential frequency function prior to Gauss
by at least thirty years. It would thus appear that the
name of Laplace might more appropriately be attached
to the function than that of Gauss. But in a recent and
very interesting historical note, Karl Pearson* finds that
De Moivre as early as 1733 gave a treatment of the prob-
ability integral and of the normal frequency function.
The work of De Moivre antedates the discussion bf La-
place by nearly a half-century. Moreover, De Moivre's
48 FREQUENCY FUNCTIONS OF ONE VARIABLE
treatment is essentially our modern treatment. Hence it
appears that the discovery of the normal function should
be attributed to De Moivre, and that his name might be
most appropriately attached to the function. It may well
be recalled that we obtained this function (1) in the De
Moivre-Laplace theory (p. 34). In (1) the origin is taken
so that the a;-co-ordinate of the centroid of area under the
curve is zero. The approximate value of the centroid may
be obtained from a large number of observed variates by
finding their arithmetic mean. The a is equal to the radi-
us of gyration of the area under the curve with respect
to the ^'-axis, and is obtained approximately from ob-
served variates by finding their standard deviation. The
probability or frequency function (1) has been derived
from a great variety of hypotheses. ^^ The difficulty is not
one of deriving this function but rather one of establish-
ing a high degree of probability that the hypotheses un-
derlying the derivation are realized in relation to practical
problems of statistics.
In the decade from 1890 to 1900, it became well estab-
lished experimentally that the normal probability func-
tion is inadequate to represent many frequency distribu-
tions which arise in biological data. To meet the situation
it was clearly desirable either to devise methods for char-
acterizing the most conspicuous departures from the
normal distributions or to develop generalized frequency
curves. The description and characterization of these de-
partures without the direct use of generalized frequency
curves has been accomplished roughly by the introduction
(see pp. 68-72) of measures of skewness and of peakedness
(excess or kurtosis), but the rationale underlying such
measures is surely to be sought most naturally in the
GENERALIZED FREQUENCY CURVES 49
properties of generalized frequency functions. In spite of
the reasons which may thus be advanced for the study of
generalized frequency curves, it is fairly obvious that, for
the most part, the authors of the rather large number of
recent elementary textbooks on the methods of statistical
analysis seem to regard it as undesirable or impracticable
to include in such books the theory of generalized fre-
quency curves. The writer is inclined to agree with these
authors in the view that the complications of a theory of
generalized frequency curves would perhaps have carried
them too far from their main purposes. Nevertheless,
some results of this theory are important for elementary
statistics in providing a set of norms for the description
of actual frequency distributions. In order to avoid mis-
understanding it should perhaps be said that it is not
intended to imply that a formal mathematical representa-
tion of many numerical distributions is desirable, but
rather that a certain amount of such representation of
carefully selected distributions should be encouraged. A
useful purpose will be served in this connection if we can
make certain points of interest in the theory more accessi-
ble by means of the present monograph.
The problem of developing generalized frequency
curves has been attacked from several different directions.
Gram (1879), Thiele (1889), and Charlier (1905) in Scan-
dinavian countries; Pearson (1895) and Edgeworth
(1896) in England; and Fechner (1897) andBruns (1897)
in Germany have developed theories of generalized fre-
quency curves from viewpoints which give very different
degrees of prominence to the normal probability curve in
the development of a more general theory. In the present
monograph, special attention will be given to two systems
50 FREQUENCY FUNCTIONS OF ONE VARIABLE
of frequency curves — the Pearson system and the Gram-
Charlier system.
16. The Pearson system of generalized frequency
curves. Pearson's first memoir^^ dealing with generalized
frequency curves appeared in 1895. In this paper he gave
four types of frequency curves in addition to the normal
curve, with three subtypes under his Type I and two sub-
types under his Type III. He published a supplementary
memoir^* in 1901 which presented two further types. A
second supplementary memoir^^ which was published in
1916 gave five additional t3^es. Pearson's curves, which
are widely different in general appearance, are so well
known and so accessible that we shall take no time to
comment on them as graduation curves for a great variety
of frequency distributions, but we shall attempt to indi-
cate the genesis of the curves with special reference to the
methods by which they are grounded on or associated
with underlying probabilities.
We shall consider a frequency function y = F(x) of one
variable where we assume that F{x)dx differs at most by
an infinitesimal of higher order from the probability that
a variate x taken at random will fall into the interval x
to x-\-dx. Pearson's types of curves y = F(ic) are obtained
by integration of the differential equation
(2) (fy^ {x-{-a)y
dx Co-\-CiX-\-C2X^ *
and by giving attention to the interval on ic in which
y = F{x) is positive. The normal curve is given by the spe-
cial case Ci = C2 = 0. We may easily obtain a clear view of
the genesis of the system of Pearson curves in relation to
PEARSON'S FREQUENCY CURVES 51
laws of probability by following the early steps in the de-
velopment of equation (2) . The development is started by
representing the probabiKties of successes in n trials given
by the terms of the symmetric point binomial (1/2 + 1/2)**
as ordinates of a frequency polygon. It is then easily
proved that the slope dy/dx of any side of this polygon,
at its midpoint, takes the form
(3) g=_42(^ + a)y,
where y is the ordinate at this point, and a and k are
constants. By integration, we obtain the curve for which
this differential equation is true at all points. The curve
thus obtained is the normal curve (Pearson's Type VII).
The next step consists in dealing with the asymmetric
point binomial {p-\-qYi p=^q, in a manner analogous to
that used in the case of the symmetric point binomial.
This procedure gives the differential equation
dy_ (x-\-a)y
dx Co-\-CiX *
from which we obtain by integration the Pearson Type
III curve
(4) y=yo(i+^)
That is, with respect to the slope property, this curve
stands in the same relation to the values given by the
asymmetric binomial polygon as the normal curve does
to values given by the symmetric binomial.
Thus far the underlying probability of success has
52 FREQUENCY FUNCTIONS OF ONE VARIABLE
been assumed constant. The next step consists in taking
up a probability problem in which the chance of success
is not constant, but depends upon what has happened
previously in a set of trials. Thus, the chance of getting
r white balls from a bag containing np white and nq black
balls in drawing s balls one at a time without replacements
is given by
(np)\{nq)\{n-s)\s\
{n)s (np-r)[{nq-s-{-r)\nlrlis-r)\ '
where («), means the number of permutations of n things
taken 5 at a time and (^) is the number of combinations
of s things r at a time. This expression is a term of a
hypergeometric series. By representing the terms of this
series as ordinates of a frequency polygon, and finding the
slope of a side of the frequency polygon, and proceeding
in a manner analogous to that used in the case of the
point binomial, we obtain a differential equation of the
form given in (2). Thus, we maker = 0, 1, 2, . . . . , ^and
obtain the 5+ 1 ordinates yo, yi, y2, . . . . , y* at unit inter-
vals. At the middle point of the side joining the tops of
ordinates yr and yr+i, we have
(6) «=f+§ , y^iiyr+yr+i) ,
and
(7)
=yr^i-y.=y.[^q-f
— S'{-nps — nq—l — r{n-\-2)
"■^^ {r+l){r+l+nq-s)
dy \^'~^ np—r
PEARSON'S FREQUENCY CURVES S3
From 3;= (>'r+>'f-t-i)/2, we have
(np-r){s-r)
(8)
-hr[
{r-\-l){r+l + nq-s)
fl
_j nps-{-nq-\-l — s-{-r{nq-]-2 — np — 25)-^2r'^
"^^"^ (r+l)(r+l + nq-s)
From (7) and (8), replacing r by ic— 1/2, we have
(9)
1 dy_ 2s+2nps-2nq-2-{2x-l){n-\-2)
y dx~nps-\-nq+i-5+{x-i){nq+2-np-2s) + 2{x-iy
From (9), we observe that the slope of the frequency
polygon, at the middle point of any side, divided by the
ordinate at that point is equal to a fraction whose numer-
ator is a linear function of x and whose denominator is a
quadratic function of x.
The differential equation (2) gives a general statement
of this property. It is more general than (9) in that the
constants of (9) are special values found from the law of
probability involved in drawings from a limited supply
without replacements. One of Pearson's generalizations
therefore consists in admitting as frequency curves all
those curves of which (2) is the differential equation with-
out the limitations on the values of the constants involved
in (9).
The questions involved in the integration of (2) and
in the determination of parameters for actual distribu-
tions are so available in Elderton's Frequency Curves and
Correlation, and elsewhere, that it seems undesirable to
take the space necessary to deal with these questions
here. The resulting types of equations and figures that
indicate the general form of the curves for certain positive
values of the parameters are listed below.
54 FREQUENCY FUNCTIONS OF ONE VARIABLE
Type I (Fig. 5)
'-'■hiTHT' f
Y
where
\.
Wi tK2
^--^
a, 0. —.o ^^^^
Type II (Fig. 6)
y^y^
H)
Je»a
FlQ.6
Type in (Fig. 7)
Y
-T«
y=y^
i'-t)
x=-a
Type IV
Fig. 7
yyi'+S)'^'-
A skew curve of unlimited range at both ends, roughly
described in general appearance as a slightly deformed
normal curve (for the normal curve, see Fig. 3, p. 35).
PEARSON'S FREQUENCY CURVES
Type V (Fig. 8)
y = yQX~^e * .
55
Fig. 8
Type VI (Fig. 9)
Y
y=y{i{x—(j)^ix ^1 .
Fig. 9
Type VII (Fig. 3, p. 35)
_.,_., 2<r^
y^yoc
The normal frequency curve.
Type VIII (Fig. 10)
y = yo
(-0
x=-a
Fig. 10
56 FREQUENCY FUNCTIONS OF ONE VARIABLE
This type degenerates into an equilateral hyperbola
when w = l.
Type IX (Fig. 11)
= ,o(l+^)".
x=-a o
Fig. 11
This type degenerates into a straight line when w = 1,
Type X (Fig. 12)
n ■
y = -e
This type is Laplace's first frequency curve while the
normal curve is sometimes called his second frequency
curve. The curve is shown for negative values of ± x/a;
Type XI (Fig. 13)
y=yox-
x=\
Fig. 13
PEARSON'S FREQUENCY CURVES
Type XII (Fig. 14)
57
_ / ai-{-x \
Fig. 14
The above figures should be regarded as roughly illus-
trating only in a meager way, for particular positive val-
ues of the parameters, the variety of shapes that are
assumed by the Pearson type curves. For example, it is
fairly obvious that Types I and II would be U-shaped
when the exponents are negative, and that Type III
would be J-shaped if 7a were negative.
The idea of obtaining a suitable basis for frequency
curves in the probabilities given by terms of a hyper-
geometric series is the main principle which supports the
Pearson curves as probability or frequency curves, rather
than as mere graduation curves. That is to say, these
curves should have a wide range of applications as proba-
bility or frequency curves if the distribution of statistical
material may be likened to distributions which arise imder
the law of probability represented by terms of a hyper-
geometric series, and if this law may be well expressed by
determining a frequency function y — F{x) from the slope
of the frequency polygon of the hypergeometric series.
In examining the source of the Pearson curves, the fact
should not be overlooked that the normal probability
curve can be derived from hypotheses containing much
broader implications than are involved in a slope condi-
tion of the side of a symmetric binomial polygon.
58 FREQUENCY FUNCTIONS OF ONE VARIABLE
The method of moments plays an essential r61e in the
Pearson system of frequency curves, not only in the de-
termination of the parameters, but also in providing
criteria for selecting the appropriate type of curve. Pear-
son has attempted to provide a set of curves such that
some one of the set would agree with any observational
or theoretical frequency curve of positive ordinates by
having equal areas and equal first, second, third, and
fourth moments of area about a centroidal axis.
Let Urn be the mth moment coefficient about a centroid
vertical taken as the y-axis (cf. p. 19). That is, let
(10) Mm= f'^x-^F{x)dx,
Xoo
x"
00
where F(x) is the frequency function (see p. 13).
Next, let
iSl=M3VM2'
and
Then it is Pearson's thesis that the conditions mo = 1,
Ml = 0, together with the equality of the numbers ^2, ^i,
and ft, for the observed and theoretical curves lead to
equations whose solutions give such values to the par-
ameters of the frequency function that we almost invaria-
bly obtain excellency of fit by using the appropriate one
of the curves of his system to. fit the data, and that bad-
ness of fit can be traced, in general, to heterogeneity of
data, or to the difficulty in the determination of moments
from the data as in the case of J- and U-shaped curves.
SELECTION OF TYPE OF CURVE 59
Let us next examine the nature of the criteria by
which to pass judgment on the type of curve to use in
any numerical case. Obviously, the form which the inte-
gral y = F{x) obtained from (2) takes depends on the
nature of the zeros of the quadratic function in the de-
nominator. An examination of the discriminant of this
quadratic function leads to equalities and inequalities in-
volving 01 and 02 which serve as criteria in the selection
of the type of function to be used. A systematic procedure
for applying these criteria has been thoroughly developed
and published in convenient form in Pearson's Tables for
Statisticians and Biometricians (1914), pages Ix-Lxx and
66-67 ; and in his paper in The Philosophical Transactions j
A, Volume 216 (1916), pages 429-57. The relations be-
tween 01 and 02 may be conveniently represented by
curves in the /^i-ft plane. Then the normal curve corre-
sponds to the point i3i = 0, ^ = 3 in this plane. Type III
is to be chosen when the point {0i, ft) is on the line
2/32 -3/3i- 6 = 0; and Type V, when (ft, ft) is on the
cubic
ft(ft+3)2 = 4(4ft-3ft)(2ft-3ft-6) .
In considering subtypes under Type I, a biquadratic
in 01 and ft separates the area of J -shaped modeless curves
from the area of limited range modal curves and the area
of U-shaped curves.
Without going further into detail about criteria for
the selection of the type of curve, we may summarize by
saying that curves traced on the 0i ft-plane provide the
means of selecting the Pearson type of frequency curve
appropriate to the given distribution in so far as the neces-
sary conditions expressed by relations between 0i and ft
6o FREQUENCY FUNCTIONS OF ONE VARIABLE
turn out to be sufficient to determine a suitable type of
curve.
The difficulties involved in the numerical computation
of the parameters of the Pearson curves were rather clear-
ly indicated in Pearson's original papers. The appropriate
tables and forms for computations in fitting the curves to
numerical distributions have been so available in various
books as to facilitate greatly the applications to concrete
data. Among such books and tables, special mention
should be made of Frequency Curves and Correlation
(1906), by W. P. Elderton, pages 5-105; Tables for Statis-
ticians and Biometricians (1924), by Karl Pearson; and
Tables of Incomplete Gamma Functions (1921), by the
same author.
17. Generalized normal curves — Gram-Charlier se-
ries. Suppose some simple frequency function such as the
normal function or the Poisson exponential function
(p. 41) gives a rough approximation to a given frequency
distribution and that we desire a more accurate analytic
•representation than would be given by the simple fre-
quency function. In this situation, it seems natural to
seek an analytical representation by means of the first
few terms of a rapidly convergent series of which the first
term, called the "generating function,'' is the simple fre-
quency function which gives the rough approximation.
Prominent among the contributors to the method of
the representation of frequency by a series may be named
Gram,i6 Thiele,^^ Edgeworth,i8 Fechner,^^ Bruns,^^ Char-
lier,2i 3^j^(j Romano vsky .22
Our consideration of series for the representation of
frequency will be limited almost entirely to the Gram-
Charlier generalizations of the normal frequency function
GRAM-CHARLIER SERIES 6i
and of the Poisson exponential function, by using these
functions as generating functions. These two types of
series may be written in the following forms:
Type A
(11) F(x) = ao(t>{x)+as<t>^'Kx)+ -f a„</>(«>W+ ,
where
1
ix-b)
and <j)^''^(x) is the wth derivative of <l){x) with respect to x.
Type B
(12) F{x)=Co^P{x)-\-ClAHx)+ • • • • +CnA^{x)+ . . . . ,
where
, , V e~^ sin xrc f 1 X , X^ 1
e-^\'
which is the Poisson exponential for non-negative integral
values of x^ and where A\l/(x), A^W, . . . . , denote
the successive finite differences of \l/{x) beginning with
^^^{x)=^P{x)-^P{x-^l).
If Type A or Type B converges so rapidly that terms
after the second or third may be neglected, it is fairly
6 2 FREQUENCY FUNCTIONS OF ONE VARIABLE
obvious that we have a simple analytic representation of
the distribution.
The general appearance of the curves represented by
two or three terms of Type A, for particular values of the
coefficients, is shown in Figure 15 so as to facilitate com-
I. y=<i>{x)j the normal curve
II. y=<f>{x)+M^'Kx)
ni. y=<t>{x)+^<i>(^)ix)+j\<t>(^){x)
parison with the corresponding normal curve represented
by the first term.
A general notion of the values of the function repre-
sented by the first term of Type B may be obtained for
particular values of X from Figure 4, page 43. When X is
taken equal to the arithmetic mean of the number of
occurrences of the rare event in question, we shall find
that Ci = 0. We may then well inquire into the general
appearance of the graph of the function
TYPE A AND TYPE B
63
for particular values of C2. and X. For X = 2 and C2 = — 4,
see Figure 16, which shows also the corresponding yf/{x).
x=2
I. y=xl^{x)
n. y=^(x)-.4AV(af)
It should probably be emphasized that the usefulness
of a series representation of a given frequency distribu-
tion depends largely upon the rapidity of convergence.
In turn the rapidity with which the series converges de-
pends much upon the degree of approach of the generat-
ing function to the given distribution.
Although it is known^^ that the Type A series is capa-
ble of converging to an arbitrary fimction/(a;) subject to
certain conditions of continuity and vanishing at infinity,
mere convergence is not sufficient for our problems. The
representation of an actual frequency distribution re-
quires, in general, such rapid convergence that only a few
terms will be found necessary for the desired degree of
approximation because (1) the amount of labor in compu-
tation soon becomes impracticable as the number of terms
64 FREQUENCY FUNCTIONS OF ONE VARIABLE
increases and (2) the probable errors of high-order mo-
ments involved in finding the parameters would generally
be so large that the assumption that we may use moments
of observations for the theoretical moments will become
invalid.
18. Remarks on the genesis of the T3rpe A and Tjrpe
B forms. We naturally ask why a generalization of the
normal frequency function should take the form of Type
A rather than some other form, say the product of the
generating function by a simple polynomial of low degree
in X or by an ordinary power series mx. k similar ques-
tion might be asked about the generalization of the Pois-
son exponential function. There seems to be no very sim-
ple answer to these questions. It is fair to say that alge-
braic and numerical convenience, as well as suggestions
from underlying probabiUty theory, have been significant
factors in the selection of Type A and Type B. The alge-
braic and numerical convenience of Type A becomes fairly
obvious by following Gram in determining the par-
ameters. The suggestion of these forms in probabihty
theory is closely associated with the development of the
hypothesis of elementary errors (deviations) as given by
Charlier.2i A very readable discussion of the manner in
which the Type A series arises in the probability theory
of the distribution of a variate built up by the summation
of a large number of independent elements is given in the
recent book by Whittaker and Robinson on The Calculus
of Observations y pages 168-74.
In the present monograph, we shall limit our discus-
sion of the probability theory underlying T)^es A a^d B
to showing in Chapter VII that a certain Une of develop-
ment of the binomial distribution suggests the use of the
COEFFICIENTS OF TYPE A SERIES 65
Type A series as an extension of the ordinary De Moivre-
Laplace approximation, and the Type B series as an ex-
tension of the Poisson exponential approximation con-
sidered in Chapter II. This development is postponed to
the final chapter of the book because it involves more
formal mathematics than some readers may find it con-
venient to follow. Certain important results derived in
Chapter VII are stated without proof in §§ 19-21. While
a mastery of the details of Chapter VII is not essential
to an understanding of the results given in §§ 19-21, the
reader who can follow a formal mathematical develop-
ment without special difficulty may well read Chapter
VII at this point instead of reading §§ 19-21. In § 56 of
Chapter VII we follow closely the recent work of Wick-
selp4 in the development of the forms of the Type A and
Type B series. Then in §§ 57-59 we deal with the princi-
ples involved in the determination of the parameters in
these type forms.
19. The coefficients of the Type A series expressed in
moments of the observed distribution. If we measure x
from the centroid of area as an origin and with units equal
to the standard deviation, c, we may write the Type A
series in the form
(13)
1
F{x)^<t>{x)-\-az4>^'\x)+a,<l>^'Kx)+ .... + an<t>^^Kx)
1 " -«V2
+ ..
where 0(^) =
(7(27r)
1/2
and <f>^"^(x) is the »th derivative of <t)(x) with respect
to X.
66 FREQUENCY FUNCTIONS OF ONE VARIABLE
It will be shown in § 57 that the coefficients an for
(w = 3, 4, . . . .) may then be expressed in the form
(14) a,=''-^-^\'°F{x)H„{x)dx,
•00
where
g.(,)„,..-!l(!Lll)^..-.+ >'("-')(^^7^2)(>'-3) ,^.
is a so-called Hermite polynomial.
To determine a„ numerically, we replace F{x) in (14)
by the corresponding observed frequency function f(x),
and replace x by x/cr if we measure x with ordinary units
(feet, pounds, etc.) instead of using the standard devia-
tion as the imit. Then we may write
(15) o,= (^J^y(^)H„g)i^ .
Insert the values of E nix Id) for w = 3, 4, 5 in (15),
and we obtain coefficients in terms of moments as follows,
using the symbol a, for the quotient /Xj/o^
n — M3 _ a«
°'^~ 0^3!"" 3!
06= -^ (M5- 10M3<r2) = -- (as- lOas)
COEFFICIENTS OF TYPE B SERIES 67
20. Remarks on two methods of determining the co-
efficients of the TjTpe A series. It will be shown in § 57
that formula (14) for any coefficient a„ of the Type A
series may be derived by making use of the fact that
0^"^(a:) and the Hermite polynomials Hf^{x) form a
biorthogonal system. Then as indicated on page 168 we
obtain a^ in terms of moments of the observed distribu-
tion.
As a second method of obtaining an in terms of the
moments of the observed distribution f{x)^ it will be
shown in § 58 that the values of the coefficients given in
§ 19 may be derived by imposing the least-squares crite-
noTi^ that
(16) F= r^ -^[f{x)-F{x)]'dx
shall be a minimum.
21. The coefficients of the Type B series. For the
Type B series (12), we shall for simplicity limit the deter-
mination of coefficients to the first three terms. More-
over, we shall restrict our treatment to a distribution of
equally distant ordinates at non-negative integral values
of X. Then the problem is to find the coefficients Co, Ci, c^ in
F{x) = c4{x)^\-c,^^l^{x)+C2^^{x) ,
where
fora;=0, 1, 2, . . . .
68 FREQUENCY FUNCTIONS OF ONE VARIABLE
By expressing the coefficients in terms of moments of
the observed distribution as shown in § 59, we find
when X is taken equal to the arithmetic mean of the given
observed values.
22. Remarks. With respect to the selection of Type
A or Type B of Charlier to represent given numerical
data, no criterion corresponding to the Pearson criteria
has been given which enables one to distinguish between
cases in which to apply one of these types in preference
to the other, but T3^e B applies, in general, to certain
decidedly skew distributions; and, in particular, to dis-
tributions of variates having a natural lower or upper
bound with the modal frequency much nearer to such
natural bound than to the other end of the distribution.
For example, a frequency distribution of the number dy-
ing per month in a city from a minor disease would have
the modal value near zero, the natural lower bound.
While the systematic procedure in fitting Charlier
curves to data is not so well standardized as the methods
used in fitting curves of the Pearson system to data,
tables of 0(/), where t is in units of standard deviation, of
its integral from to /, and of its second to eighth deriva-
tives are given to five decimal places for the range / =
to / = 5 at intervals of .01 by James W. Glover,^ and tables
of the function, its integral and first six derivatives are
given by N. R. Jorgensen^® to seven decimal places for
/ = to/ = 4.
23. Skewness. Charlier has fittingly called the coeffi-
cients ^3, a^, ^6, . . . . , along with the mean and standard
SKEWNESS 69
deviation, the characteristics of the distribution. The co-
efficients az and 6^4 niay be interpreted so as to give charac-
teristics which appear very significant in a description of
a distribution to a general reader with little or no mathe-
matical training. It is the common experience of those
who have dealt with actual distributions of practical sta-
tistics that many of the distributions are not symmetrical.
A. measure is needed to indicate the degree of asymmetry
or skewness of distributions in order that we may de-
scribe and compare the degrees of skewness of different
distributions.
A measure of skewness is given by
(17) 5=-3a3 = ^ = |a8.
Another measure of skewness is
,.Q. „_Mean— Mode
{10) o .
In this latter measure we have adopted a convention as
to sign by which the skewness is positive when the mean
is greater than the mode. Some authors define skewness
as equal numerically but opposite in sign to the value in
our definition.
We may easily prove that the measures (17) and (18)
are equal for a distribution given by the Pearson T3^e
III curve, and approximately equal for a distribution giv-
en by the first two terms of the Gram-Charlier Type A
when S as defined in (17) is not very large.
For the Pearson Type III (p. 54),
dy^ {x-{-a)y
dx Co-\-CiX
70 FREQUENCY FUNCTIONS OF ONE VARIABLE
When the parameters in this equation are expressed in
moments about the mean, the equation takes the form
ldy_ x-{-nz/2(T ^
ydx iJL2-{-fJLsx/2(T^ '
if the origin is at the mean of the distribution. The mode
is the value of x for which
dy ^ M3
dx la^
That is,
Mean — Mode _ m3
~o 'la'
Hence the measures (17) and (18) are equal for the Type
III distribution.
For a distribution given by the first two terms of Type
A, we are to consider the frequency curve
(19)
y=0(^)+a3<^^'H^)
We shall now prove that the distance from the mean
(origin) to the mode is approximately —aS when 5 is
fairly small.
EXCESS OR MEASURE OF KURTOSIS 71
We have from (19)
if we neglect terms in 5^. Then
ldy_ X '5' x^5'_„
ydx a^ a cr^
at the mode. Solving the quadratic for x we obtain
x= —aS a we neglect terms of the order S^. Hence, the
measures (17) and (18) are approximately equal for a dis-
tribution given by the first two terms of a Gram-Charlier
Type A series.
24. Excess. In the general description of a given fre-
quency distribution, we may add an important feature to
the description by considering the relative number of
variates in the immediate neighborhood of some central
value such as the mean or the mode. That is, it would
add to the description to give a measure of tlie degree of
peakedness of a frequency curve fitted to a distribution
by comparison with the corresponding normal curve
fitted to the same distribution. The measure of the peak-
edness to which we shall now give attention is sometimes
called the excess and sometimes the measure of kurtosis.
The excess or degree of kurtosis is measured by
£ = 3a4 = i(^:-3)=i(a4-3).
If the excess is positive, the number of variates in the
neighborhood of the mean is greater than in a normal dis-
72 FREQUENCY FUNCTIONS OF ONE VARIABLE
tribution. That is, the frequency curve is higher or more
peaked in the neighborhood of the mean than the corre-
sponding normal curve with the same standard deviation.
On the other hand, if the excess is negative, the curve is
more flat topped than the corresponding normal curve.
To obtain a clearer insight into the relation of the measure
of excess to the theoretical representation of frequency,
let us consider a Gram-Charlier series of T3^e A to three
terms
y = <t>(x)+ az4>^'^ (x) + a,<t>^'^ (x)
(7(27r)i/2[^ acr^U W
(20)
(7(27r)
1/2
[-g(?-3+f(^-¥«)]
When we compare the ordinate of (20) at the mean
x^Q with the ordinate l/(r(27r)^''^ at the mean for the
normal curve, we observe that this ordinate exceeds the
corresponding ordinate of the normal curve by E/(T{2iry^^.
That is, the excess E is equal to the coefGcient by which
to multiply the ordinate at the centroid of the normal
curve to get the increment to this ordinate as calculated
by retaining the terms in 0^^^(a:) and (l>^^\x) of the Type
A series.
25. Remarks on the distribution of certain trans-
formed variates. Underlying our discussion of frequency
functions, there has perhaps been an implication that
TRANSFORMATION OF VARIATES 73
the various types of distribution could be accounted for
by an appropriate theory of probability. There may, how-
ever, be other than chance factors that produce significant
effects on the type of the distribution. Such effects may
in certain cases be traced to their source by regarding the
variates of a distribution as the results of transformations
of the variates of some other type of distribution. Edge-
worth was prominent in thus regarding certain distribu-
tions. For simple examples, we may think of the diame-
ters, surfaces, and volumes of spheres that represent ob-
jects in nature, such as oranges on a tree or peas on a
plant. Suppose the distribution of diameters is a normal
distribution. It seems natural to inquire into the nature
of the distribution of the corresponding surfaces and
volumes. The partial answer to the inquiry is that these
are distributions of positive skewness. The same kind of
problem would arise if we knew that velocities, Vj of mole-
cules of gas were normally distributed, and were required
to investigate the distribution of energies mv^/2.
To illustrate somewhat more concretely with actual
data it may be observed in looking over the frequency
distributions of the various subgroups on build of men,
in Volume I of the Medico- Actuarial Mortality Investiga-
tion, that the distributions with respect to weight are, in
general, not so nearly S3rmmetrical as the distributions as
to height. In fact, the distributions as to weight exhibit
marked positive skewness. For example, in the age group
25 to 29 and height 5 feet 6 inches we find the following
distribution :
W 105 120 135 150 165 180 195 210
F 17 722 2,175 1,346 485 155 33 3,
Where P^= weight in pounds, F= frequency.
74 FREQUENCY FUNCTIONS OF ONE VARIABLE
A similar feature had been observed by the writer in
examining many frequency distributions of ears of corn
with respect to length of ears and weight of ears. The
distributions as to weight showed this tendency to posi-
tive skewness, whereas the distributions as to lengths of
ears were much more nearly symmetrical. It seems natu-
ral to assume that the weights of bodies are closely corre-
lated with volumes. We may next take account of the
fact that volumes of similar solids vary as the cubes of
like linear dimensions.
Such concrete illustrations suggest the investigation
of the equation of the frequency curve of values obtained
by the transformation of variates of a normal distribu-
tion by replacing each variate x of the normal distribution
by an assigned function of the form kx"", where ^ is a
positive constant and w is a positive integer or the recipro-
cal of a positive integer. A paper on this subject by the
writer appeared in the Annals of Mathematics'" in June,
1922. The skewness observed in the distributions of
weights is similar to the skewness which results as the
effect of this transformation when w is a positive constant.
From a different standpoint S. D. Wicksell^^ in the
Arkiv for Matematik, Astronomi, och Fysik in 1917 has
discussed, by means of a generalized hypothesis about ele-
mentary errors, a connection between certain functions
of a variate and a genetic theory of frequency. The hy-
potheses involved in this theory are at least plausible in
their relation to certain statistical phenomena. There are
thus at least two points of view which indicate that the
method which uses variates resulting from transformation
may rise above the position of a device for fitting distribu-
tions and be given a place in the theory of frequency. A
EXTENSION OF SERIES REPRESENTATION 75
recent paper^^ by E. L. Dodd presents a somewhat critical
study of the determination of the frequency law of a func-
tion of variables with given frequency laws, and another
recent paper^° by S. Bernstein deals with appropriate
transformations of variates of certain skew distribu-
tions.
26. Remarks on the use of various frequency func-
tions as generating functions in a series representation.
In the Handbook of Mathematical Statistics (1924), page
116, H. C. Carver called attention to certain generating
functions designed to make frequency series more rapidly
convergent than the Type A series. In a paper pubhshed
in 1924 on the ''Generalization of Some Types of the Fre-
quency Curves of Professor Pearson" (Biometrika, pp.
106-16), Romano vsky has used Pearson's frequency func-
tions of Types I, II, and III as the generating functions
of infinite series in which these types are involved in a
manner analogous to the way in which the normal proba-
bility function is involved in the Gram-Charlier series.
When Type I,
'H'i'-t)'-^-
yoUo
is used as a generating function, certain functions (f>k,
which are polynomials of Jacobi in slightly modified form,
occur in the expansion in a way analogous to that in which
the Hermite polynomials occur in the Gram-Charlier ex-
pansion. Moreover, the analogy is continued because
Wo0A and (i)k form a biorthogonal system, and this prop-
erty facilitates the determinations of the coefficients in
the series.
76 FREQUENCY FUNCTIONS OF ONE VARIABLE
When the Type III function
is used as a generating function, certain functions <^jfe,
which are polynomials of Laguerre in generalized form,
play a r61e similar to that of the polynomials of Hermite
in the Gram-Charlier expansion.
While it is at least of theoretical interest that various
frequency functions may assume r61es in the series repre-
sentation of frequency somewhat similar to the r61e of
the normal frequency function in the Gram-CharUer
theory, the fact should not be overlooked that the useful-
ness of any series representation in applications to nu-
merical data is much restricted by the requirement of
such rapid convergence of the series that only a few terms
need be taken to obtain a useful approximation.
CHAPTER IV
CORRELATION
27. The meaning of simple correlation. Suppose we
have data consisting of N pairs of corresponding variates
fe)3'»)> ^' = 1, 2, . . . . , iV. The given pairs of values may
arise from any one of a great variety of situations. For
example, we may have a group of men in which x repre-
sents the height of a man
and y his weight ; we may
have a group of fathers
and their oldest sons in
which X is the stature of
a father and y that of his T
oldest son; we may have
mifiimal daily tempera-
tures in which x is the ' •
minimal daily tempera-
ture at New York and y
the corresponding value
for Chicago; we may be considering the effect of nitrogen
on wheat yield where x is pounds of nitrogen applied
per acre and y the wheat yield; we may be throwing
two dice where x is the number thrown with the first
die and y the number thrown with the two dice together.
If such a set of pairs of variates is represented by dots
marking the points whose rectangular co-ordinates are
(x, y)y we obtain a so-called "scatter-diagram."
Assume next that we are interested in a quantitative
Fig. 17
77
78 CORRELATION
characterization of the association of the x's and the cor-
responding y^s. One of the most important questions
which can be considered in such a characterization is that
of the connection or correlation as it is called between the
two sets of values. It is fairly obvious from the scatter-
diagram that, with values of x in an assigned interval dx
(dx small), the corresponding values of y may differ con-
siderably and thus the y corresponding to an assigned x
cannot be given by the use of a single- valued function of
X. On the other hand, it may be easily shown that in
certain cases, for an assigned x larger than the mean value
of x^s, a corresponding y taken at random is much more
likely to be above than below the mean value of 3;'s. In
other w^ords, the x's and y's are not independent in the
probability sense of independence. There is often in such
situations a tendency for the dots of the scatter-diagram
to fall into a sort of band which can be fairly 'well de-
scribed. In short, there exists an important field of statis-
tical dependence and connection between the regions of
perfect dependence given by a single- valued mathematical
function at one extreme and perfect independence in the
probability sense at the other extreme. This is the field
of correlated variables, and the problems in this field are
so varied in their character that the theory of correlation
may properly be regarded as an extensive branch of mod-
ern methodology.
28. The regressive method and the correlation sur-
face method of describing correlation. It may help to
visualize the theory of correlation if we point out two
fundamental ways of approach to the characterization of
a distribution of correlated variables, although the two
methods have much in common. The one may be called
REGRESSION METHOD 79
the "regression method," and the other the "correlation
surface method."
Let us assume that the pairs of variates (x, y) are
represented by dots of a scatter-diagram, and set the
problem of characterizing the correlation. First, separate
the dots into classes by selecting class intervals dx. When
we restrict the rr's to values in such an interval dx, the set
of corresponding y's is called an a:-array of 3''s or simply
an array of y's. Similarly, when we restrict the assign-
ment of y's to a class interval dy, the corresponding set
of x's is called a y-array of x^s or simply an array of x's.
The whole set of arrays of a variable, say of y, is often
called a set of parallel arrays.
The regression curve y =f{x) of y on :*: for a population
is defined to be the locus of the expected value (§ 6) of the
variable y in the array which corresponds to an assigned
value of X, as dx approaches zero. In other words, the
regression curve of y on a: is the locus of the means of
arrays of y's of the theoretical distribution, as dx ap-
proaches zero.
These equivalent definitions relate to the ideal popula-
tion from which a sample is to be drawn. The regression
curve found from a sample is merely a numerical approxi-
mation to the ideal set up in the definition.
In the regression method, our first interest is in the
regression curves of y on it; and of x on y. W.e are inter-
ested next in the characterization of the distribution of
the values of y (array of y's) whose expected or average
value we have predicted. This is accomplished to some
extent by means of measures of dispersion of the values
of y which correspond to an assigned value of x. To illus-
trate the regression method by reference to the correlation
8o CORRELATION
between statures of father and son, we may say that
the first concern in the use of the regression method is
with predicting the mean stature of a subgroup of men
whose fathers are of any assigned height, and the next
concern is with predicting the dispersion of such a sub-
group. The complete characterization of the theoretical
distributions underlying arrays of y's may be regarded as
the complete solution of the problem of the statistical
dependence of y on x.
In the correlation surface method for the two vari-
ables, our primary interest is in the characterization of the
probability <^(:x;, y)dx dy that a pair of corresponding vari-
ates {Xy y) taken at random will fall into the assigned
rectangular area bounded hy x to x-\-dx and y to y-\-dy.
This method may be regarded as an extension to func-
tions of two or more variables of the method of theoretical
frequency functions of one variable. To get at the mean-
ing of correlation by this method, suppose that a func-
tion g{x) is such that g{x)dx gives, to within infinitesimals
of higher order, the probability that a variate x taken
at random lies between x and x-{-dx; and suppose that
h{x,y)dy gives similarly the probability that a variate
y taken at random from the array of values which cor-
respond to values of a; in the interval x to x-\-dx will lie
between y and y-\-dy. Then the probability that the two
events will both happen is given by the product
(1) 4>{p0y y)dx dy^g{x)h(x, y)dx dy .
For the probability that both of two events will happen
is the product of the probability that the first will happen,
multiplied by the probability that the second will happen
when the first is known to have happened.
CORRELATION SURFACE METHOD 8i
Two cases occur in considering this product. In the
first case, h{x, y) is a function of y alone. When this is the
case we say the x and y variates are uncorrected and
<f){x, y) is simply the product of a function of x only multi-
plied by a fimction of y only. In such a case the proba-
bility that a variate y will be between y and y-\-dyb, the
same whether the corresponding assigned x be large or
small. In the second case h{x, y) is a function of both x
and y. In such cases, the probability that a variate y will
be between y and y+dy is not, in general, the same for
corresponding assigned large and small values of x. In
such cases the two systems of variates are said to be
correlated. Thus, in considering for example a group of
college students, the height of a student is probably
uncorrelated with the grades he makes in mathematics
or with the income of his father, but his height is cor-
related with his weight, and with the height of his father.
Both the regression method and the correlation sur-
face method of dealing with correlation have been in
evidence almost from the earliest contributions to the sub-
ject. The early method of Francis Galton was essentially
the regression method, but the mathematical solution of
the special problem^^ which he proposed to J. D. Hamilton
Dickson in 1886 consisted in giving the equation of the
normal frequency surface to correspond to given lines of
regression. The solution of this problem thus involved
the correlation surface method. Furthermore, the early
contributions of Karl Pearson to correlation theory, in-
volving the influence of selection, stressed frequency sur-
faces*^ more than regression equations. But. beginning
with a paper^ by G. Udny Yule in 1897, the theory has
been developed without limitation to a particular type of
82
CORRELATION
frequency surface. It is a fact of some interest that Yule
returned very closely to the primary ideas of Galton, by
placing the emphasis on the lines of regression. Moreover,
the success of the regression method of approach should
give us an insight into the simplicity and fundamental
character of Galton 's original ideas.
29. The correlation coefficient. The degree of correla-
tion is often measured by the Pearsonian coefficient of
correlation represented by the letter r. Consider N pairs
of variates (x,-, 3';), i = l, 2, . . . . , iV, such as are de-
scribed above, and let (*, y) represent the corresponding
arithmetic means of re's and y's. Then
<^z =
<r« =
i=A' -11/2 r
1=1
i=y 11/2 r
1/2
1/2
are the standard deviations of the two series.
Assuming that at least two of the jc's are unequal so
that o-,=|=0, we let any variate which is denoted by :*:, in
original units (yards, miles, pounds, dollars) be denoted
by Xi when measured from the mean x with the standard
deviation o-, as a unit. Similarly, let the value )f yi be
denoted by y< when measured from the mean y with <jy as
a unit. That is,
Xi = {Xi-x)/(T,: ,
Then in terms of x and >'J,
given by the simple formula
yi^{yi-y)/<^y'
the correlation coefficient is
(2)
THE CORRELATION COEFFICIENT S^
That is, the correlation coefficient of two sets of vari-
ates, expressed with their respective standard deviations
as units, may be defined as the arithmetic mean of the
products of deviations of corresponding values from their
respective arithmetic means.
We have defined the correlation coefficient r for a
sample. The expected value of the right-hand member of
(2) in the sampled population is then the correlation co-
efficient for the population.
While the formula (2) is very useful for the purpose of
giving the meaning of the correlation coefficient, other
formulas easily obtained from (2) are usually much better
adapted to numerical computation. For example,
(3) r^ ^_ _
(4)
m^>-T%-z*-f\
1/2
are ordinarily more convenient than (2) for purposes of
computation.
When 'N is small, say < 30, formula (4) is readily ap-
plied. When N is large, appropriate forms for the calcula-
tion of r are available in various books.
Still other forms for expressing r are useful for certain
purposes. For example, for the purpose of showing that
— l^r^l, we shall now give two further formulas
for r.
By simple algebraic verification and remembering that
84 CORRELATION
1 = ^x;VA^ = X^',- ViV, it follows that (2) may be written
in the forms^
(6) r l+^'^{xi+y'i)^ .
From these two formulas, we have the important proposi-
tion that
(7) -l^r^l.
THE REGRESSION METHOD OF DESCRIPTION
30. Linear regression. Suppose we are interested in
the mean value yx of the ys in the a:-array of y's. The
simplest and most important case to consider from the
standpoint of the practical problems of statistics is that
in which the regression of y on ic is a straight line. Assum-
ing that the regression curve of y on ic in the population
is a straight line, we accept as an approximation the line
yx'^mx-^-h which fits "best" the means of arrays of the
sample.
The term "best" is here used to mean best under a
least-squares criterion of approximation. In applying the
criterion the square (yx — mx — by for each array is weight-
ed with the number in the array. Let Nx be the number
of dots in any assigned ic-array of y's. Then the equation
of our line of regression would be
(8) yx==mx+b ,
where m and b are to be determined by the condition that
the sum
(9) j;iNx(yx-mx-b)\
LINEAR REGRESSION 85
with observed data substituted for x, yx and Nx from all
arrays, is to be a minimum. Differentiating (9) with re-
spect to h and m, we have
(10) -2Y,N,{%-mx-b)^(^,
(11) -2Y,N,{%-mx-h)x = ^ .
We may note that Nxjx is equal to the sum of all y's in
an array of >''s. If we examine these equations on making
substitutions for jx and x, it is easily seen that they are,
except for grouping errors which vanish as dx-^O^ equiva-
lent to the equations
(12) -2Y,{yi-mXi-h)=0 ,
(13) -2Y,oCi{yi-mxi-h) = ,
where the summation is extended to all the given pairs.
That is, we may find the regression line by obtaining the
linear function y = mx-\-h, which gives the best least-
square estimate of the values of y which correspond to
assigned values of x. Take the origin at the mean of a:'s
and the mean of ys. Then ^yi = 0, Xx, = 0. Hence,
from(12), Z> = 0. From (13)
and the equation of the line of regression of y on a; is
0-1
(14) y^r-^x.
86 CORRELATION
Similarly, the line of regression of a: on 3; is
(15) x^r^'-^y.
It should be remembered that the origin is at the mean
values of :j;'s and of >''s when the regression equations take
the forms (14) and (15). It is obvious that these equa-
tions may be written as
(16)
y-y = r- {x-x)
and
(17)
x-x^r""/ {y-y)
when we take any arbitrary origin.
The coefficient rcTy/dx is called the regression coeffi-
cient of y on X, and similarly raxi Oy is the regression co-
efficient of X OYiy.
If we use standard deviations as units of measurement
the regression equations (14) and (15) become
(18) y' = rx' , x' = ry' ,
and the regression coefficients are equal to each other and
to the correlation coefficient.
When there is no correlation between a;'s and y's, r = 0,
and the regression lines of ;y on a; and of a: on y are parallel
to the X- and y-axes, respectively. On the other hand,
when r = 0, it is not necessarily true that there is no cor-
relation. Indeed, there may be a high correlation^^ with
non-linear regression when r = 0. For example, we may
have r = when y is a simple periodic function of x.
STANDARD DEVIATION OF ARRAYS 87
31. The standard deviation of arrays — mean square
error of estimate. In passing judgment on the degree of
precision to be expected in estimating the value of a vari-
able, say y^ by means of the regression equation of y on x,
it is important to have a measure of the dispersion in
arrays of >''s.
The mean square error s% involved in taking the ordi-
nates of the line of regression as the estimated values of
y may be very simply expressed by s\ = a% (1— r^). To
prove that s% takes this value, we may write the sum of
the squares of deviations in the form
Heno
e, we have
(19)
sl=al{\-r^),
(20)
j,=a,(l-r»)i/2
This value of 5, may be regarded as a sort of average
value of the standard deviations of the arrays of >''s, and
is sometimes called the root-mean-square error of estimate
of jy or more briefly, the standard error of estimate of y.
The factor (1 -^2)1/2 in (20) has been called the coefficient
of alienation or the measure of the failure to improve the
estimate of y from knowledge of the correlation.
When the standard deviation of an array of >''s is re-
garded as a function, say S{x), of the assigned x, the curve
y = S{x)/(Ty is called the scedastic curve. It may be de-
scribed as the curve whose ordinates measure the scatter
88 CORRELATION
in arrays of ^''s in comparison to the scatter of all ^''s.
When S(x) is a constant, the regression system of y on ic
is called a homoscedastic system. When S{x) is not con-
stant, the system is said to be heteroscedastic. For a homo-
scedastic system with linear regression, Sy=(Ty{\—r^y^^ is
the standard deviation of each array of y's.
To illustrate (20) nimierically, let us suppose that
r = .5 gives the correlation of statures of fathers and sons.
Assuming linear regression, the root-mean-square error
of estimate of the height of a son derived from the as-
signed height of the father would be
^»=^v[.75]i/2=.866<ry.
That is, the average dispersion in the arrays of heights of
sons which correspond to assigned heights of fathers is
about .87 as great as the dispersion of the heights of all
the sons. It is, therefore, fairly obvious that we cannot,
with any considerable degree of reliability, predict from
r = .5 the height of an individual son from the height of
the father. However, with a large N, we can give a very
reliable prediction of the mean heights of sons that corre-
spond to assigned heights of fathers.
It should be remembered that we have thus far as-
sumed linear regression of yonx. An analogous consider-
ation of the dispersion in arrays of :i:'s gives for the mean
square error of estimate
when we assume linear regression of x on y.
32. Non-linear regression — the correlation ratio. In
case a curve of regression, say of y on x, is not a straight
THE CORRELATION RATIO 89
line, the correlation coefficient as a measure of correlation
may be misleading. In introducing a correlation ratio,
rjyx, oi y on X as an appropriate measure of correlation
to take the place of the correlation coefficient in such a
situation, we may get suggestions as to what is appropri-
ate by solving for r^ in (19). This gives
(21) r^^l-sl/&
where we may recall that s^ is the mean square of devia-
tions from the line of regression. Then
f=±(i-4A^.)i/2.
This formula could be used appropriately as a defini-
tion of r in place of our definition in (2), and its examina-
tion may throw further light on the significance of r.
When ^y = 0, the formula gives r= 1 and, as we have seen
earlier, all the dots of the scatter-diagram must then fall
exactly on the line of regression y = rayx/dx- When Sy = ay,
the formula gives r = 0, and the regression line is in this
case of no aid in predicting the value of y from as-
signed values of x. In the formula r^ = l — sl/(Tl it is im-
portant to keep in mind that the mean square deviation
4 is from the line of regression (§31). Next, let Sy^ be
the corresponding mean square of deviations from the
means of arrays. Then in the population Sy^ = sl when
the regression is strictly linear, but Sy^=^sl when the
regression is non-linear. This fact suggests the use of a
formula closely related to [1— ^5/o"y]'^^ for a measure of
non-Unear regression by replacing Sy by Sy. We then write
(22) vlr=l-s7/al,
90 CORRELATION
where rjyx is the correlation ratio of y on x, and Sy^ is the
mean square of deviations from the means of arrays
whether these means are near to or far from the line
of regression. For linear regression of y on x, we have
Vyx = r^ in the population.
In general, we may say that the correlation ratio of
;y on :v is a measure of the clustering of dots of the scatter-
diagram about the means of arrays of y's.
An analogous discussion for the arrays of x's obviously
leads to
giving rjxyy the correlation ratio of x on y.
That r)lx ^ 1 and that the equality holds only when
all the dots in each array are at the mean of the array
follows at once from (22).
That rjlx"^ r^may be shown by recalling the meanings of
4 in (21) and of Sy^ in (22). A mean square of deviations
in each array is a minimum when the deviations are taken
from the mean of the array. Hence, the Sy^ in (22) must
be equal to or less than 4 in (21) for the same data, since
the deviations in (21) are measured from the line of re-
gression. Hence, we have shown that
Moreover, when the regression oi y on x is linear, rjyx — ^^
found from the sample differs from zero by an amount
not greater than the fluctuations due to random sampling.
Indeed, the comparison of the quantity rjyx — rl with its
sampling errors becomes the most useful known criterion
for testing the linearity* of the regression of y on x.
THE CORRELATION RATIO 91
For some purposes, it is convenient to express the
correlation ratios in a form involving the standard devia-
tion of the means of arrays. For this purpose, let J, be
the mean of any array of ^''s, and (Sy^ the standard devia-
tion of the means of arrays when the square {yx — yf of
each deviation is weighted with the number iV, in the
array. Then it follov/s very simply that
„" _ ^'2 g\
O y Oy
That is, the correlation ratio oi y on x is the ratio of the
standard deviation of the means of arrays of 3''s to the
standard deviation of all )/'s.
The calculation of the correlation ratio with a large
number N of pairs may be carried out very conveniently
as a mere extension of the calculation of the correlation
coefficient. For a form for such c'alculation, see Handbook
of Mathematical Statistics, page 130.
In order to get a fair approximation to a correlation
ratio in a population from a sample, it is important that
the grouping into class intervals be not so narrow as to
give arrays containing very few variates. Certain valu-
able formulas for the correction of errors due to grouping
have been published. ^^
When the regression is non-linear, the correlation may
be further characterized by the equation of a curve of
regression that passes approximately through the means
of arrays of a given system of variates. As early as 1905,
the parameters of the special regression curves given by
polynomials y =f{x) of the second and third degrees were
determined in terms of power moments and product
92 CORRELATION
moments. In 1921, Karl Pearson^ published a general
method of determining successive terms of the regression
curve of the form
(24) y=fix) = ao\po+ai\pi-{- • • • • +Cn\A» ,
where ^o, fli, . . . . , On are constants to be determined
and the functions ips form an orthogonal system of func-
tions of X. That is,
^(^-^^.^ = 0,
if the summation ^ be taken for all values of x correspond-
ing to a system of arrays with frequency in an :r-array
given by Nx. An exposition of the theory of non-linear
regression curves is somewhat beyond the scope of this
monograph.
33. Multiple correlation. Thus far we have considered
only simple correlation, that is, correlation between two
variables. But situations frequently arise which call for
the investigation of correlation among three or more vari-
ables. A familiar example occurs in the correlation of a
character such as stature in man with statures of each
of the two parents, of each of the four grandparents, and
possibly with statures of others back in the ancestral line.
Other examples can be readily cited. Indeed, it is very
generally true that several variables enter into many prob-
lems of biology, economics, psychology, and education.
The solution of these problems calls for a development
of correlation among three or more variables. Suppose we
have given N sets of corresponding values of n variables
Xi, Xij . . , . J Xn. Assume next that we separate the val-
MULTIPLE CORRELATION 93
ues of Xi into classes by selecting class intervals fe, dxz,
. . . . , dxn of the remaining variables. When we limit the
it:2's to an assigned interval dx2, Xs^s to an assigned interval
dxs, and so on, the set of corresponding iCi's is sometimes
called an array of 0:1 's.
The locus of the means of such arrays of Xis in the
theoretical distribution, as dx2, dxz, . . . . , dXn approach
zero, is called the regression surface of Xi on the remaining
variables. It will be convenient to assume that any vari-
able, Xj, is measured from the arithmetic mean of its N
given values as an origin. Let dj be the standard devia-
tion of the N values of Xj, and let rpq be the correlation
coefficient of the N given pairs of values of Xp and Xq.
Then we seek to determine hn, bn, . . . . , bin, c, the para-
meters in the linear regression surface,
(25) Xi = bi2X2 + bizX3+ • • • • +binXn + C ,
of Xi on the remaining variables so that Xi computed from
(25) will give on the whole the "best" estimates of the
values of Xi that correspond to any assigned values of
X2, 0^3, .... , Xn. Adopting a least-squares criterion, we
may determine the coefficients in (25) so that
(26) U=J2(^^-^i2X2-bi3X3- • • • • -binXn-Cy
shall be a minimum. This gives for the linear regression
surface of Xi on X2, Xz, . . . . , Xn,
q = 2
94
CORRELATION
where Rpq is the cofactor of the ^th row and the ^th
column of the determinant
(28)
1 ^12 ri3 . . . . Tin
^21 1 ^23 . . . . r^n
rz^ r^ 1 ....
^nl ^n2
1
For simplicity we shall limit ourselves to n = 3 in giv-
ing proofs of these statements, but the method can be
extended in a fairly obvious manner from three variables
to any number of variables.
Equating to zero the first derivatives of V in (26)
with respect to c, bn, and bis, we obtain, when n = 3, the
equations
c = 0,
V — 223 '^2(a ~ ^i2-'^2 - bizXz) = .
— 2 ^:r3(xi — bi2X2 — buXz) = ,
The last two equations may be written in the form
^a:i.T2 - bi2^xl — biz^X2Xz = ,
^XiXz — bi2Ylx2X3 - ^13X^3 = .
By expressing the summations in terms of standard de-
viations and correlation coefficients, we have
(29)
(30)
Nbl2(Tl + Nbizr23cr2<Ts = Nri2Cri(T2 ,
Nbi2r23<^2az-i-Nbiz(Tl = Nrizaiffs ,
MULTIPLE CORRELATION
Solving for bn and bu, we obtain
95
1^12 ''23
''12 ''23
*"- dot
ri3 1
1 r23
— ^
0-2
ri3 1
1 ''23
^23 1
rj3 1
h -^1
t>13 = —
0-3
1
ri2
''23
ri3
1
''23
''23
1
Henc^
^^^ Kn (Tq
where i?^ is the cofactor of the pih. row and gth column of
i? =
If the dispersion (scatter) (ri.23 . ...» of the observed
values of Xi from its corresponding computed values on
the hyperplane (27) is defined as the square root of mean
square of the deviations, that is,
1
rn
''13
^21
1
''23
TZI
''32
1
(31)
<^1J23 n
-T^ ^^ (observed Xi — computed XiYj
then it can be proved that
(32) (Ti^.. .n = (Ti{R/Rny^^
96 CORRELATION
To prove this for w = 3, we may write from (27) and (31)
— "^^ (-^11 + -^12 "i'-'^lS "I" 2i?lli?12''l2 + 2RiiRizri3 + 2i?i2i?l3''23)
^11
^2 [RniRu-\-ri2Ri2-\-rizRi3) +-^i2(-Ki2+ ''12^11 +''23^13)
-{•Ri3{Ri3-^rizRn-\-r2^i2)] •
Since from elementary theorems of determinants,
•^11 + ''12^12 + ''13^13 = -R ,
i?12 + ''l2^11 + ''23^13 = ,
/^IsH- ''13^11 + ''23^12 = ,
we have
(33) al2^ = a? R/Rn , (Ti^ = (TiiR/RnY^^ .
As an extension of the standard error of estimate with
two variables (p. 87), it is true for n variables that the
standard error 0-1.23 .... n of estimating 0:1 from assigned
values of X2, X3, . . . . , Xn is the standard deviation of
each array of XiS, provided all regressions are linear and
the standard deviation of an array of x/s is the same for
all sets of assignments of X2, Xz, . . , . , Xn.
Next, we shall inquire into the dispersion of the esti-
mated values given by (27) . Since the mean value of these
estimates is zero, when the origin is at the mean of each
MULTIPLE CORRELATION COEFFICIENT 97
system of variates, we have the standard deviation aiE ot
the estimates of Xi given by
= _£l|,^,+,^,} = <r?(l-A)
The correlation coefficient ri.23 n between the ob-
served values of Xi and its corresponding estimated values
calculated from the linear function (27) of X2, 0^3, .. , Xn
is called the multiple correlation coefficient of order n—\
of Xi with the other n — \ variables. The multiple correla-
tion coefficient ri.23 . . . . « is expressible in terms of simple
correlation coefficients by the formula
(34) ri^....» = [l-i?/i?iiP/2.
To prove (34), limiting ourselves to w = 3, we write
„ * 2'V^^i/ R\ix^ R\iXi\
iV<ri(7i£ri.23 = <Tt > -I -^ r-~ p" 7)
-^^criX Kii (72 All 0-3/
-AVf
-N(j\
Ru
{Ri2ru-\- Ruris)
iR-Rn)=N<Tlil-R/Rn)
Since
(ri£ = (Ti[l-/?MiJi/%
we have the result sought,
98 CORRELATION
The relation (34) is very significant because it enables us
to express multiple correlation coefficients in terms of sim-
ple correlation coefficients.
From equations (32) and (34), it follows that
(35) <Ti.23 .... n = <^l(l~"''l.23 . . . . n) •
34. Partial correlation. It is often important to obtain
the degree of correlation between two variables Xi and x^
when the other variables Xs, Xi, . . . . , Xn have assigned
values. For example, we might find the correlation of
statures of fathers and sons when the stature of the moth-
er is an assigned constant, say 62 inches. In general, sup-
pose we have found a correlation between characters A
and B, and that it is a plausible interpretation that the
correlation thus found is due to the correlation of each
of them with a character C. In this case we could remove
the influence of C, if we had a sufficient amount of data,
by restricting our data to a universe of A and B corre-
sponding to an assigned C.
In accord with this notion, we may define a partial
correlation coefficient r'12.34 . . . .nOi Xi and X2 for assigned
Xz, Xi, , , . . , Xn, SiS the correlation coefficient of Xi and
X2 in the part of the population for which X3, x^y . . . . ,
Xn have assigned values. A change in the selection of as-
signed values may lead to the same or to different values
of r 12.34 ......
Suppose we are dealing with a population for which
the regression curves are straight lines and the regression
surfaces are planes. Thus, let us assume that the theo-
retical mean or expected values of Xi and X2 for an as-
signed X3, Xiy . . . . , Xn are
613^3 + ^14:^4+ • • • • -f binXn ,
b23X3-h b2iXi-{- ' • ■ -]-b2nXn ,
PARTIAL CORRELATION qq
respectively. Then a partial correlation coefficient
r'n.si .... n is the simple correlation coefficient of residuals
and
^.34 .... n=X2 — b23X3—b2iX4— ' ' ' ' — 62n^n
limited to the part of the population Nz4 . . . n of the
total N for which Xz, Xi, . . . . , Xn are fixed.
Suppose further that the population is such that any
change in the assignment of values to Xs, Xi, . . . . , Xn
does not change the standard deviation of 0:1.34 ... „.
nor of Ji:2.34 ....«, nor the value of /^.u ....«• Such a
population suggests that we define
(36) ri2.34 ... -
Nai.zi . . n C2.34
n
where the summation is extended to N pairs of residuals,
as Ihe partial correlation coefficient of Xi and Xy for all sets
of assignments oi x^ X4, . . . . , x^.
If the population is such that r'12.34 ... n is not the
same for eachViiflerent set of assignments oix3,X4, ,Xn,
the right-hand member of (36) may still be regarded as
a sort of average value of the correlation coefficients
of Xi and X2 in subdivisions of a population obtained by
assigning X3, X4, . . . . , Xn, or it may be regarded as the
correlation coefficient between the deviations of Xi and
X2 from the corresponding predicted values given by their
linear regression equations on Xz, X4, . . . . a:„.
The partial correlation coefficient as given in (36) is
lOO
CORRELATION
expressible in terms of simple correlation coefl&cients by
the formula
(37)
^12.34 . . . . n —
[i?lli?2
11/2 >
where Rpq is a cof actor defined in ^33.
We may prove (37), limiting ourselves ton = 3y as follows:
By definition
^12.3 =
A^0"1.3 (72.3 -^O'l.S 0r2.3
^:x:iX2 — /-IS —2:^2^3 — ^23 —^ ^1:^^3+^13^23 "^^^
[2 (*'-^'v! ^')'2 (*^-^- 7I *')!
1/2
yi2~ ^13^23
2
-i?l
[(l-n3)(l-r^3)]^/2 [i2lli?22]l/2
Thus, (37) is proved for n — 3.
An important relation between partial and multiple
correlation coefficients may now be derived. From (37)
we have
t 2
1 — ^"12.34
RnR22~ -f^l2
RuRii
By a well-known theorem of determinants,^
^11 ^12
R12 R22
= RiiR22—Ri2=RRu 22
MULTIPLE AND PARTIAL CORRELATION loi
Hence we have
RRu22 Ru 1-^1
1 — ^12 34 ...••■p 5"" P„„ 1 2
All -tV22 ^22 1 — ri 34
i^ll22
since from (32) and (35),
and similarly
R . 2
•^-=1— ''1.23 ..n>
•^22 4 J
= 1— ^^1.34.. .«
/?n
22
Thus we can express the partial correlation coefficient
;'i2.34 „ of order n-2 (the number of variables held
constant) in terms of the multiple correlation coefficient
^1.23 „ of order w~l and the multiple correlation co-
efficient ri.34 n of order n-2.
35. Non-linear regression in n variables— multiple
correlation ratio. The theory of correlation for non-linear
regression lends itself to extension to the case of more
than two variables as has been demonstrated by the con-
tributions of L. Isserlis^^ ^nd Karl Pearson.*^
Consider the variables Xi, Xz, . . . . y Xn, and fix atten-
tion on an array of Xi's which corresponds to assigned
values of x^, xz, , ^«. Next, let *i.23 . . n be the
mean of the values in the array of :x:i's and let 0-1.23 . . . . n
be the standard deviation of these means of arrays of a^i's,
where the square of each deviation Xi.23 . . . . n from the
mean of o^i's is weighted with the number in the array in
I02 CORRELATION
finding this standard deviation. Then the multiple cor-
relation ratio 7/1.28 » of iCi on :jc2, ^3, . . . . , ^» may be
defined by
-2
f2H\ J - ^^•'^ **
\PQ) 1?1.23 . . . » — 2
The analogy with the case of the correlation ratio for
two variables seems fairly obvious. While the method of
computing the multiple correlation ratio 7/1.23 .... n is
simple in principle, it is unfortunately laborious from the
arithmetic standpoint.
36. Remarks on the place of probability in the regres-
sion method. Thus far we have discussed simple correla-
tion by the regression method without using probabilities
in explicit form. To be sure, probability theory is in-
volved in the background. It seems fairly obvious that
it would be of fundamental interest to construct urn
schemata which would give a meaning to the correlation
and regression coefficients in pure chance. In a paper**
published by the author in 1920, certain urn schemata
were devised which give linear regression and very simple
values for the correlation coefficient. Other schemata ap-
parently equally simple give non-linear regression. The
general plan of the schemata consists in requiring certain
elements to be common in successive random drawings.
It appears that the construction of such urn schemata
will tend to give correlation a place in the elementary
theory of probability.
In a recent book*' by the Russian mathematician,
A. A. Tschuprow, an important step has been taken to-
ward connecting the regression method of dealing with
CORRELATION AND PROBABILITY 103
correlation more closely with the theory of probability.
This is accomplished by a consideration of the under-
lying definitions and concepts for a priori distribu-
tions.
It may be noted that we have not based our develop-
ment of the regression method on a precise definition of
correlation. Instead we have attempted a sort of genetic
development. It may at this point be helpful in forming
a proper notion of the scope and limitations of the regres-
sion method to give a definition of correlation from the re-
gression viewpoint. It seems that a general definition will
involve probabilities because we shall almost surely wish
to idealize actual distributions into theoretical distribu-
tions or laws of frequency for purposes of definition. In a
general sense, we may say that y is correlated with x
whenever the theoretical distributions in arrays of y\ are
not identical for all possible assigned values of x, and we
say that y is uncorrected with x whenever the theoretical
distributions in arrays of y's are identical with each other
for all possible values of x. By the identity of the theo-
retical distributions in arrays of >''s, we mean that they
have equal means, standard deviations, and other par-
ameters required to characterize completely the distribu-
tions. It is fairly obvious that our discussion of the re-
gression method is incomplete in a sense because we have
not given a complete characterization of distributions in
arrays. Our characterization of the statistical depend-
ence of y on ic may be regarded as complete when the
arrays of y's are normal distributions, because the dis-
tributions are then completely characterized by their
arithmetic means and standard deviations.
I04 CORRELATION
THE CORRELATION SURFACE METHOD OF DESCRIPTION
37. The normal correlation surfaces. The function
2=/fe, :r2, . . . . , Xn)
is called di frequency function of the n variables, Xi,a;2, . . ^x^
if
zdxi dX2 .... dxn
gives, to within infinitesimals of higher order, the proba-
bihty that a set of values of Xi, X2, . . . . , Xn taken at
random will lie in the infinitesimal region bounded by Xi
and Xi-i-dxi, X2 and X2-]-dx2, . . . . y Xn and x„-{-dXn.
When the variables are not independent in the proba-
bility sense, the surface represented by s =f(xi, X2, . , , Xr,)
is called a correlation surface.
With the notation of § 29 for simple correlation, the
natural extension of the theory underlying the normal
frequency function of one variable to functions of two
variables x and y leads to the correlation surface
2rxy
27ro-,a,(l-r2)i/2
2(1
1 /«« y« 2rxy \
Moreover, with the notation of § 2f2> on multiple corre-
lation the natural extension to the case of a function of n
normally correlated variables Xi^ X2, . . . . , Xn gives a
frequency function of the exponential type
where is a homogeneous quadratic function of the n
variables which may be written in the form
R\ U\ q\ (Tj 0-2 /
NORMAL CORRELATION SURFACES 105
the determinant R and its cofactors Rpp and Rpq being
defined in § 33. We thus have a correlation surface in
space of « + 1 dimensions.
For purposes of simplicity we shall limit our deriva-
tions of normal frequency functions to functions of two
and three variables thus restricting the geometry involved
to space of three and four dimensions.
The equation of the normal frequency surface may be
derived from various sets of assumptions analogous to and
extensions of sets of assumptions from which the normal
frequency curve may be derived. Some of these deriva-
tions make no explicit use of the fact that in normal
correlation the regression is Knear. That is, linear re-
gression is considered as a property of the frequency sur-
face obtained from other assumptions. But we may con-
nect the frequency-surface method closely with the re-
gression method by involving linear regression of one of
the variables on the others as one of the assumptions from
which to derive the surface. This is the plan we shall
adopt in the following derivation. Let us assume, first,
that one set of variates, say the x's, are distributed
normally about their mean value taken as an origin.
Then in our notation (p. 47 and § 29)
(39) ^-^, e-^. dx ,
to within infinitesimals of higher order, is the probability
that an x taken at random will lie in the interval dx.
Assume next that any array of y's corresponding to an
assigned it: is a normal distribution with the standard
deviation of an array given by <Ty(\—r^y^'^ as found earlier
in this chapter (§31), and finally, assume that the re-
lo6 CORRELATION
gression of ^^ on a; is linear. Then in the notation of simple
correlation
is, to within infinitesimals of higher order, the probability
that a y taken at random from an assigned array of ^''s
will lie in the interval dy.
By using the elementary principle that the probability
that both of two events will occur is equal to the product
of the probabilities that the first will occur and that the
second will occur when the first is known to have occurred,
we have the product z dxdy of (39) and (40) for the proba-
bility, to within infinitesimals of higher order, that x will
fall in dx and the corresponding y in dy, where
1 1 / a;« y2 2rxy \
is the normal correlation surface in three dimensions.
Let us turn next to the derivation of the normal corre-
lation surface in four dimensions. Following the notation
of multiple correlation we seek a normal frequency func-
tion
Z=f{Xl, X2y Xi) .
We shall assume first that pairs of the variates, say
of X'iS and XzS, are normally distributed. Then by what
has just been demonstrated about the form of the correla-
tion surface in three dimensions, the expression
1 X2^ . xi
*2 ^i\
(42) r. — vTTTo «"2^r:^ W'^o^^"^'" ^. W ^^ ^^8
27r(r2(73[l-r23]^/^
NORMAL CORRELATION SURFACES 107
is, to within infinitesimals of higher order, the probability
that a point fe, ^3) taken at random lies within the area
dxidxz. We next assume that the regression of Xi on x^ and
Xi is linear, and that each array of Xi's corresponding to
an assigned (x2, x^) is a normal distribution with standard
deviation
<ri.23 = (7iWi?ii)i/2
given by (32).
Then in the notation of multiple correlation, the prob-
ability that a variate taken at random in an assigned
{x2, :r3)-array of Xi?> will lie in dx^ is given, to within infini-
tesimals of higher order, by
-/^ ^^TTT^ e 2/20-, \ * * ^11 0-2 ^u 0-3/ dxi
(43)
ai(2TrR)
1/2
Then the probability that a point {xi, X2, x^ taken at
random will lie in the volume dxidxzdxz is given, to within
infinitesimals of higher order, by the product of (42) and
(43). This gives, after some simplification, for the proba-
bility in question, z dxidx^dxz, where
1
(44)
z
~(2»)3/2 7?i/2<r,„2<r,^
-J*,
and
♦41
f
Rn
\
|+2i..
XiXi
(71(72
(7i (73 (72 (73/
io8 CORRELATION
38. Certain properties of normally correlated dis-
tributions. The equal-frequency curves obtained by mak-
ing z take constant values in equation (41) are an infinite
system of homothetic ellipses, any one of which has an
equation of the form
The area of the ellipse is
(l-r2)i/2
and the semiaxes are given by a = k\ and b = y\, where
k and k' are functions of o-,, Cy, and r. The probability
that a point {x, y) taken at random will fall within any
ellipse obtained by assigning X is given by
(45) ^TTCT. cr, I " e-2(ib) ^' X JX = 1 - e-2(rbi)
ZwCTa
Attention has often been called to the equal frequency
ellipse known as the "probable" ellipse. The probable
ellipse may be defined as that ellipse of the system such
that the probability is 1/2 that a point (x,y) of the
scatter-diagram (see Fig. 18, p. 109) lies within it. This
means by (45) that
x«
2a-rr.= i ^ X2=1.3863 (l-r^) .
x»
From (45) it follows that [X/(l -r^)] e ^(i-^^) AX gives,
to within infinitesimals of higher order, the probability
NORM.\L CORRELATION
109
that a point (x, 3;) taken at random will fall in a small
ring obtained by taking values of X in AX.
We may determine the ellipse^^ along which, for a
given small ring AX, we should expect more points {x, y)
than along any other ellipse of the system. For a constant
AX, the probability is a maximum when X^ = 1 — r^. Hence,
what may be called the ellipse of maximum probabiUty
is
x" y^ 2rxy_
(Jr. CTjy (Tx (Tt
1
To illustrate the meaning of this ellipse, we may say
that in Bertrand's illustration of shooting a thousand
shots at a target, the probability is greater that a shot
will strike along this ellipse than along any other elHpse
of the system. It is an interesting fact that -the ellipse
of maximum probability is identical with the orthogonal
projection of parabolic points of the correlation surface
on the plane of distribution. To prove this theorem, we
no CORRELATION
simply find the locus of parabolic points on the surface
(41) by means of the well-known condition
This gives
dx^ dy^ \dxdy)
ffx (^y ^x ^y
which establishes the theorem.
By comparing X2 = l-r2 with X2 = 1.3863 (l-r^), we
note that the probable ellipse is larger than the ellipse of
maximum probability. For the statures of 1 ,078 husbands
and wives, the two ellipses just discussed are shown on the
scatter-diagram in Figure 18. By actual count from the
drawing (Fig. 18), it turns out that 536 of the 1,078 points
are within the probable ellipse and 412 are within the
ellipse of maximum probability. These numbers differ
from the theoretical values by amounts well within what
should be expected as chance fluctuations.
Another interesting problem in connection with the
correlation surface relates to the determination of the
locus along which the frequency or density of points on
the plane of distribution (scatter-diagram) bears a simple
relation to the corresponding density under independence.
Thus, we seek the curve along which dots of the scatter-
diagram are k times as frequent as they would be under
independence where ^ is a constant. Equating z in (41)
to k times the corresponding value of z when r = in (41),
we obtain after slight simplification the hyperbola
(Fig. 18)
NORMAL CORRELATION ill
Karl Pearson dealt*^ with this curve for ^ = 1. That is,
he considered the locus along which the density of points
of the scatter-diagram is the same as it would have been
under independence. The fact that the density of dis-
tribution at the centroid in (41) is 1/(1— r2)i/2 times as
much as it would be under independence naturally sug-
gests the study of the locus of all points for which
/fe = l/(l-r2)i/2 in (46). It turns out that in this case
the hyperbola degenerates into straight lines
r (Tx
These lines are shown as lines AB and CD on Figure 18.
They separate the plane of distribution into four com-
partments such that one-fourth is the probability that a
pair of values (x,y) taken at random will give a point
falling into any prescribed one of these compartments.
Although no further discussion of the properties of
normal correlation surfaces will be attempted in this
monograph, certain properties analogous to those men-
tioned for the surface in three dimensions would follow
rather readily in the case of the surfaces in higher dimen-
sions. Thus the system of ellipsoids of equal frequencies
has been studied to some extent."*^ In a paper by James
McMahon,^® the connection between the geometry of the
hypersphere and the theory of normal frequency func-
tions of n variables is established by linearly transforming
the hyperellipsoids of equal frequency into a family of
hyperspherical surfaces, and by applying the formulas of
hyperspherical goniometry to obtain theorems in multiple
and partial correlations.
112 CORRELATION
39. Remarks on further methods of characterizing
correlation. In bringing to a conclusion our discussion of
correlation, it may be of interest to point out a few of the
limitations and omissions in our treatment, and to give
certain references that would facilitate further reading.
We have not even touched on the methods of dealing
with correlation of characters which do not seem to admit
of exact measurement, but admit of classification; for
example, eye color, hair color, and temperament may be
regarded as such characters. Such characters are some-
times called qualitative characters to distinguish them
from quantitative characters. The correlation between
two such characters has been dealt with in some cases by
the method of tetrachoric^^ correlation, in other cases by
the method of contingency,^^ and by the method of cor-
relation in ranks^^ in cases where the items are ordered
but not measured. We have not touched on the methods
of dealing with correlation''^ in time series — a subject of
much importance in the methodology of economic statis-
tics. The methods and theories of connection and con-
cordance of Gini^° for dealing with correlation have been
omitted. No discussion has been given of the fundamental
work of Bachelier^i on correlation theory in his treatment
of continuous probabilities of two or more variables. Our
discussion of frequency surfaces in §37 is limited to normal
correlation surfaces. The way is, however, fairly clear for
the extension^^ of the Gram-Charlier system of representa-
tion to distributions of two or more variables which are
not normally distributed.
While great difficulties have been encountered in the
past thirty years in attempts to pass naturally from the
Pearson system of generalized frequency curves to analo-
REMARKS 113
gous surfaces for the characterization of the distribution
of two correlated variables, it is of considerable interest
to remark that substantial progress has been made re-
cently on the solution of this problem by Narumi,^^ Pear-
son," and Camp.^
Although the many omissions make it fairly obvious
that our discussion is not at all complete, it is hoped that
enough has been said about the theory of correlation to
indicate that this theory may be properly considered as
constituting an extensive branch in the methodology of
science that should be further improved and extended.
CHAPTER V
RANDOM SAMPLING FLUCTUATIONS
40. Introduction. In Chapter II we have dealt to some
extent with the effects of random sampling fluctuations
on relative frequencies. But it is fairly obvious that the
interest of the statistician in the effects of sampling fluc-
tuations extends far beyond the fluctuations in relative
frequencies. To illustrate, suppose we calculate any sta-
tistical measure such as an arithmetic mean, median,
standard deviation, correlation coefficient, or parameter
of a frequency function from the actual frequencies given
by a sample of data. If we need then either to form a
judgment as to the stability of such results from sample
to sample or to use the results in drawing inferences
about the sampled population, the common-sense process
of induction involved is much aided by a knowledge of
the general order of magnitude of the sampling discrep-
ancies which may reasonably be expected because of the
limited size of the sample from which we have calculated
our statistical measures.
We may very easily illustrate the nature of the more
common problems of sampling by considering the deter-
mination of certain characteristics of a race of men. For
example, suppose we wish to describe any character such
as height, weight, or other measurable attributes among
the white males age 30 in the race. We should almost
surely attempt merely to construct our science on the
basis of results obtained from the sample. Then the ques-
114
INTRODUCTION 1 15
tion arises : What is an adequate sample for a particular
purpose? The theory of sampling throws some light on
this question. The development of the elements of a the-
ory of sampling fluctuations in various averages, coeffi-
cients, and parameters is thus of fundamental importance
in regarding the results obtained from a sample as ap-
proximate representatives of the results that would be
obtained if the whole indefinitely large population were
taken.
One of the difficult and practical questions involved
in making statistical inquiries by sample relates to the
invention of satisfactory devices for obtaining a random
sample at the source of material. A result obtained from
a sample unless taken with great care may diverge signifi-
cantly from the true value characteristic of the sampled
population. For example, the writer had an experience
in attempting to pick up a thousand ears of Indian com
at random with respect to size of ears. It soon appeared
fairly obvious that instinctively one tended to make
''runs" on ears of approximately the same size. The sam-
ple would probably not be taken at random when thus
drawn. Such systematic divergence from conditions nec-
essary for obtaining a random sample is assumed to be
eliminated before the results that follow from the theory
of random sampling fluctuations are applicable. In the
practical applications of sampling theory, it is thus im-
portant to remember that the conditions for random
sampling at the source of data are not always easily ful-
filled. In fact, it seems important in certain investigations
to devise special schemes for obtaining a random sample.
For example, we may sometimes improve the conditions
for drawing a random sample of individuals by the use
ii6 RANDOM SAMPLING FLUCTUATIONS
of a ball or card bearing the number of each individual
of a much larger aggregate than the sample we propose
to measure and by then drawing the sample by lot from
such a collection of balls or cards after they have been
thoroughly mixed. Even with urn schemata containing
white and black balls thoroughly mixed, it must be as-
sumed further that one kind of balls is not more slippery
than another if slippery balls evade being drawn. The
appropriate devices for obtaining a random sample de-
pend almost entirely on the nature of the particular field
of inquiry, and we shall in the following discussion simply
assume that random samples can be drawn.
In an inquiry by sample, the following fundamental
question comes up very naturally about any result, say
a mean value ic, to be obtained from a sample of 5 Indi-
viduals: What is the probability that x will deviate not
more numerically than an assigned positive number 5
from the corresponding unknown true value x that would
be given by using an unlimited supply of the material
from which the 5 variates are drawn? This question pre-
sents difficulties. An ideal answer is not available, but
valuable estimates of the probability called for in this
question may be made under certain conditions by a
procedure which involves finding the standard deviation
of random sampling deviations.
For the unknown true value f referred to above, con-
tinental European writers very generally use the mathe-
matical expectation or the expected value of the variable
(cf. § 6). In what follows, we shall to some extent adopt
this practice and shall find it convenient to assume the
following propositions without taking the space to demon-
strate them:
. EXPECTED VALUE 117
I. The expected value E [x — E{x)\ of deviations of a
variable from its expected value E{x) is zero.
II. The expected value of the sum of two variables is
the sum of their expected values. That is, E{x-\-y) =
E{x)+E{y).
III. The expected value of the product of a constant
and a variable is equal to the product of the constant by
the expected value of the variable. That is, E{cx) = cE{x) .
IV. The expected value of the product xy of corre-
sponding values of two mutually independent variables
X and y is equal to the product of their expected values,
where we call x and y mutually independent if the law of
distribution of each of them remains the same whatever
values are assigned to the oth6r.
V. In particular, if x and y are corresponding devia-
tions of two mutually independent variables from their
expected values, the expected value of the product xy is
zero. It is fairly obvious that V follows from I and IV.
It is convenient in the discussion of random sampling
fluctuations to deal with the problem of the distribution
of results from samples of equal size. To give a simple
example, let us conceive of taking a random sample con-
sisting of 1 ,000 men of a well-defmed race in which some
character is measured giving us 1,000 variates. Next, sup-
pose we repeat the process until we have 1,000 such sam-
ples of 1,000 men in each sample. Then each of the sam-
ples would have its own arithmetic mean, median, mode,
standard deviation, moments, and so on. Consider next
the 1,000 results of a given kind, say the 1,000 arithmetic
means from the samples. They would almost surely differ
but slightly from one another in comparison with differ-
ences between extreme individual variates. But if the
Ii8 RANDOM SAMPLING FLUCTUATIONS
measurements are reasonably accurate the means would
differ and form a frequency distribution. This frequency
distribution of means would have its own mean (mean of
means) and its own standard deviation. We are especially
interested in such a standard deviation, for it may be
taken as an approximate measure of the variabihty or
dispersion of means obtained from different samples. This
standard deviation (standard error) would no doubt be a
fairly satisfactory measure of sampling fluctuations for
certain purposes.
Although the process of finding mean values from each
of a large number of equal samples with a large number
of individuals in each sample gives us a useful conception
of the problem of sampling errors in mean values, it would
ordinarily be a laborious and usually an impractical task
because of paucity of available data to carry out such a
set of calculations. The statistician ordinarily obtains a
result from a sample by calculation, say a mean value Xy
and then investigates the standard deviation of such re-
sults without taking further samples. That such a treat-
ment of the problem is possible is clearly an important
mathematical achievement.
The space available in the present monograph will not
permit the derivation of formulas for the standard devia-
tion of sampling errors in many types of averages or
parameters. In fact, we shall limit ourselves to presenting
only sufficient derivations of such formulas to indicate the
nature of the main assumptions and approximations in-
volved in the rationale which supports such formulas, and
certain of their interpretations. Preliminary to deriving
formulas for standard deviations of sampling errors in
certain averages and parameters, we need to find the
STANDARD ERROR IN CLASS FREQUENCIES 119
Standard deviation and correlation of errors in class fre-
quencies of any given frequency distribution. For brevity
we shall use the expression "standard error" in place of
"standard deviation of errors."
41. Standard error and correlation of errors in class
frequencies. Suppose we obtain from a random sample of
a population an observed frequency distribution
/l,/2, ....,/<,.... ,/»
with a number ft of individuals in a class t, and with a
total of /1+/2+ .... +/n = -^ individuals observed in the
sample.
Suppose next that we should obtain a large number of
such samples of s observations each taken under the same
essential conditions. A class frequency ft will vary from
sample to sample. These values// of will form a frequency
distribution. We set the problem of expressing the ex-
pected value of the square of the standard deviation oy^
in terms of observed values.
To solve this problem, we may consider that any ob-
servation to be made is a trial, and that it is a success to
obtain an observation for which the individual falls in
the class /. Let pt be the probability of success in one
trial, and qt=l—pt be the corresponding probability of
failure.
In sets of 5 trials with a constant probabihty pt of
obtaining an individual in the class t, we have from page
27 that the square of the standard deviation of /< in the
theoretical distribution is given by
(1) (rit^sptqt = spt(l-pt).
120 RANDOM SAMPLING FLUCTUATIONS
In statistical applications, we do not ordinarily know
the exact value of pty but accept the relative frequency
ft/s as an approximation to /?« if 5 is large. If we thus
accept ft/s as an approximation to pt^ and substitute
p^=J^/s in (1), we obtain
(2) 4=/.(i-/.A)
as an approximate value of the square of ay^ conveniently
expressed in terms of observed frequencies.
The value (2) is regarded as an appropriate approxi-
mation to the value of (1) because (1) may be obtained
from (2) by replacing the quotient ft/s by its expected
value pt. It is usually agreed among statisticians, how-
ever, that a better approximation to (1) would be an ex-
pression which as a whole has the second member of (1)
as its expected value. The expected value of the product
ft(l-ft/s) is not the product spt{l-pt) of the expected
values of its factors, as we shall see in the next paragraph.
It will be found that the second member of the equation
(3) ''h=7^A'-i)
has spt{^—pt) as its expected value, and (3) is therefore
regarded as a better approximation than (2) for express-
ing (1) in terms of observed frequencies. The reason for
the advantage of formula (3) over formula (2) is the sub-
ject of frequent inquiries by students of statistics, and
it is hoped that the discussion here given will contribute
to answering such inquiries.
In accordance with the principle just stated it will
be seen that the error introduced by replacing sptil — pt)
STANDARD ERROR IN CLASS FREQUENCIES 121
hy ft{\— ft/ s) involves not only sampling errors, but also
a certain systematic error. Thus, although the expected
value of /< is spt (p. 26) and the expected value of 1 —Jt/s
is !—/><, we shall see as stated above that the expected
value of the product /<(! —fi/s) is not equal to the product
spt{l — pt) of the expected values, but is in fact equal to
{s—l)pt{l — pt)- We may prove this by first expressing
(1), with the help of the definition of oy^, in the form
(4) E[if,-spd'] = sp,{l-pd ,
and then applying the last proposition on page 21 which
states that the expected value of the square of the vari-
able X is equal to the square of the expected value of x
increased by the expected value of the square of the de-
viations of X from its expected value. Thus, for a variable
X =ft with an expected value spt, we write
E{ft') = s'Pt'+El{ft-spty] = sW+sp,{l-pd
from (4). Further,
(5)
E[Ml-f,/s)]=E{ft)-\E(f,'^)
= spt-spt^-ptil-pt) = {s-l)pt{l-pt) .
By multiplying both members of (5) by s/{s—l), we may
write
sp,ii-p,)=Ey-^Mi-fjs)Y
Thus, in approximating to the value spt(l — pt) in the
right member of (1) by means of a function of the ob-
122 RANDOM SAMPLING FLUCTUATIONS
served /<, we note that the function sft(l—ft/s)/{s — l)
has the expected value spt(l—pt) which we seek, and that
/«(1 —ft/s) given in the right member of (2) as an approxi-
mation to spt{l ~pt) contains a systematic error.
In finding standard errors in means, moments, correla-
tion coefficients, and so on, it is important to know the
correlation between deviations of frequencies in any two
classes. Let dft be the deviation of ft from the theoretical
mean or expected value of the class frequency in taking
a random sample of 5 variates. Then since /i -1-/2+ • • • •
+/<+ • • • • +fn = s = si constant, we have
(6) 5/14-5/2+ • • • • +8ft+ • • • • +5/n = .
If our sample has given 8ft more than the expected
number in the class /, it may reasonably be assumed that
a deficiency equal to —8ft will tend to be distributed
among the other groups in proportion to their expected
relative frequencies.
Now suppose we had a correlation table made of pairs
of values of 5/^ and 5/^, obtained from a large number of
samples. Consider the array in which 8ft has a fixed value
By (6), for each sample,
-5/^ = 5/1+5/2+ .... +5/e_i+5/,+i+ • . . . +5/„ .
Assume that the amount of frequency in the left mem-
ber of this equality is distributed to terms of the right
member in such proportion that, for a fixed 5/i, the mean
value of 8ft' is
This gives the mean of the array under consideration.
CORRELATION OF ERRORS 123
It is fairly obvious that the correlation coefl&cient
of A'' pairs of deviations x and y from mean values is
equal to
X '
where y^ is the mean of the x-array of y's, and iV, is the
number in the array. Then
ruj,(Ty = mean value of xy^ = ^ ^^ xyj^,. .
By attaching this meaning to the correlation coefficient
^itiv ^^U and/r and using (7) f(^r the mean of the array,
we have
fSiSf^Si^ff = mean value of — h]t - -^'^^
1-Pi
= T-^ (i^ean value of dft^) = — ^ aft^
l — pt i — pt
(8) =-sptPt'iTom(l)
(9) =-M'
as a first approximation.
A systematic error is involved in replacing sptpt^ by
ftft'/$ on account of the correlation between // and ff.
124 RANDOM SAMPLING FLUCTUATIONS
To deal with the effect of this correlation, we may first
write (3), page 83, in the form
N
r = l
If we are dealing with a population or theoretical dis-
tribution rather than with a sample, this formula gives
us the proposition that the expected value of the product,
Xiyi, of pairs of variables is equal to the product, xy, of
their expected values increased by the product, ra^cTy, of
the correlation coefficient and the two standard devia-
tions.
To apply this proposition when X/=// and 3^,=/^, we
note from (8) that, for the population, r<Jx(Ty= —sptpf,
and recall that E(ft)=spt and E{ft')=spt>. Then the
proposition stated above gives us
E{ft]t') = s''pipi'-sptpt',
(10) E{JJc/s) = \E{fJ,) = {s- \)p,pt' .
To obtain the right member of (8) as accurately as
possible in terms of the observed // and // , we multiply
both members of (10) by s/{s — l) and then note that
fift'/{s — l) has the expected value sptpf In the right
member of (9), the value ftfc/s used as an approxima-
tion to sptpt' thus contains a certain systematic error. To
eliminate the systematic error from (9), we write
(11) ry,r,af,a,,= -{^^
in place of (9) as a second approximation to (8).
REMARKS 125
42. Remarks on the assumptions involved in the deri-
vation of standard errors. The three outstanding assump-
tions that should probably be emphasized in considering
the validity and the limitations of the results (2) and (9)
are (a) that the probability that a variate taken at
random will fall into any assigned class remains constant,
(b) that the number 5 is so large that we obtain certain
valuable approximations by using the relative frequency
ft/s in place of the probability pt that a variate taken at
random will fall into the class /, and (c) that any sampling
deviation dft from the expected value of a class frequency
is accompanied by an apportionment of —dft to other
class frequencies in amounts proportional to the expected
values of such other class frequencies. Our use of assump-
tion (b) involves more than is apparent on the surface,
because in its use we not only replace a single isolated
probability pthy a, corresponding relative frequency /</5,
but we further assume the liberty of using certain func-
tions of the relative frequencies in place of these functions
of the corresponding probabiHties or expected values.
This procedure may lead to certain systematic errors in
addition to the sampling errors. For example, we have,
in obtaining (2), used the function //(I— //A) oi ft/s in
place of the same function spt(l—pt) of the expected
value pt, and have by this procedure tended to under-
estimate the expected value when 5 is finite. That is,
sft{l—fi/s)/{s—l) and not ft{^—ft/s) is our best esti-
mate of the expected value. However, when s becomes
large, ft{\ —ft/s) is a valuable first approximation to the
expected value.
The rule that the expected value of a function may be
taken as approximately equal to the function of the ex-
126 RANDOM SAMPLING FLUCTUATIONS
pected value has been much used by statisticians in a
rather loose and uncritical manner. A critical study of the
application and limitations of this rule was published by
Bohlmann^^ in 1913. While it is beyond the scope of this
monograph to enter upon a general discussion of Bohl-
mann's conclusions, it is of special interest for our purpose
that the application of the rule leads at least to first ap-
proximations when the functions in question are algebraic
functions. Although it may seem that we have in the
derivation of (2) and (9) taken the liberty to substitute
relative frequencies rather freely in place of the proba-
bilities required in an exact theory, this procedure may be
extended to any algebraic functions when the number s
is very large, with the expectation of obtaining useful
approximations. Since certain derivations which follow
make use of (2) and (9), the resulting formulas involve the
weaknesses and limitations of the above assumptions.
43. Standard error in the arithmetic mean and in a
qih moment coefficient about a fixed point. For the arith-
metic mean of s observed values of a variable x we write
t=i
where /i is the class frequency of Xt.
Suppose the s values constitute a random sample of
observations on the variable x. Suppose further that we
continue taking observations on x until we have a very
large number of random samples each consisting of s ob-
served values. Then assume that there exists an expected
value of each ft about which the observed /t's exhibit
dispersion, and that corresponding to these expected val-
STANDARD ERROR IN THE MEAN 127
ues there exists a theoretical mean value x oi x about
which the ^'s calculated from samples of 5 exhibit dis-
persion. Using 5/ and dx to denote deviations in any sam-
ple from the expected values of /and x, respectively, we
write
s8x = ^Xt8ft f
sK8xy = T.i^Md + 2j^'{xat'8ft8ft') ,
where the sum J^ extends from t = ltot = n, and 2' is the
sum for all values of t and /' for which t4=t'.
Next, sum both members of this equality for all sam-
ples and divide by the number of samples. This gives in
the notation for standard deviations (p. 1 19) and for the
correlation coefficient (p. 123),
s^4'=^Y^{xyf^)-\-2J^\xtXt'af^af^,rf^f^,) ,
By using (1) and (8), we have
s4 = T.ix]pt)-J^(x]p^t) - 2Yl'{xtXt'ptPt)
= ni-{7:xtPty==tii-x' = a\
where a is the standard deviation of the theoretical dis-
tribution. Then
(12) '-.=^2 •
Instead of the a of the theoretical distribution, we
ordinarily use the a obtained from a sample. To introduce
the expected value of a^ from the sample, we may, for a
first approximation, use (2) and (9) in place of (1) and
(8) above, and obtain very simply a form identical with
(12).
128 RANDOM SAMPLING FLUCTUATIONS
As a second approximation, we may use (3) and (11)
in place of (1) and (8) above, and obtain very simply
(13) 4 = -^ and
1 ^ (5-1)^/2
where a is to be obtained from the sample.
The distinction between the expected value of g^ from
the population and from the sample involves a rather
delicate point, but one that has been long recognized in
the literature of error theory. The distinction has been
rather generally ignored in books on statistics. In nu-
merical problems, the differences in the results of formulas
(12) and (13) are negligible when s is large.
The standard deviation (standard error) may well
serve as a measure of sampling fluctuations. But custom
has not established the direct use of the standard error
to any considerable extent. The so-called probable erroi
has come into much more common use than the standard
error. The probable error E is sometimes defined very sim-
ply as .6745 times the standard error without regard to the
nature of the distribution. This definition of the probable
error does not impose the condition that the distribution
of results obtained on repetition shall necessaiily be a
normal distribution. But with such a definition of prob-
able error, the real difficulty is not overcome, but merely
shifted to the point where we attempt an interpretation
of the probable error in terms of the odds in favor of or
against an observed result obtained from a sample falling
within an assigned deviation of the true value.
Thus, in the derivation of (12) we have obtained, sub-
ject to certain important limitations, the standard devia-
PROBABLE ERRORS 129
tion of means x obtained from samples of 5 about a theo-
retical mean value x which may ordinarily be regarded
as a sort of a true value of the mean. If the distribution
of x's obtained from samples about such a true value is
assumed to be a normal distribution, we may by the use
of the table of the probability integral state at once that
the odds are even that an x obtained from a sample will
differ numerically from the true value by not more than
£=.6745 (standard error) .
It is the assumption of a normal distribution of the means
from samples combined with the specification of an even
wager that brings the multiplier .6745 into the problem.
We may further expedite the treatment of sampling
errors by finding the odds in favor of or against an ob-
served deviation from the true value not exceeding nu-
merically a certain multiple of E, say tE. As t increases
to 5, 6, or more, the odds in favor of obtaining a deviation
smaller than tE are so large as to make it practically
certain that we will obtain such a smaller deviation.
We have discussed briefly the meaning and limitations
of probable errors. The most outstanding limitation on
the interpretation of probable errors is the requirement
of a normal distribution of the statistical constant under
consideration. We have to a considerable extent used the
arithmetic mean as an illustration, but the same general
requirements about the normality of the distribution
would clearly apply, whatever the statistical constant.
We shall consider next the standard error in a qth.
moment coefficient Hq about a fixed point. By definition,
I30 RANDOM SAMPLING FLUCTUATIONS
For the relation between deviations from theoretical val-
ues we have
Then
Sum both members of this equality for a large number
of samples N and divide by N. This gives in the notation
for standard deviations (p. 119) and for the correlation
coefficient (p. 123)
Using (1) and (8), we have
S(Tk = 12ixVpd - Tix-^m - 2Z'WxVptPt)
Then
1/2
(14) ---p-^^T .
where the moments in the right-hand member relate to
the theoretical distribution. By methods analogous to
those used in the case of the arithmetic mean (pp. 127-
28), we may pass to moments which relate to the sample.
The probable error of ju^ is then £ = .6745<7^^, and
the usual interpretation of such a probable error by means
STANDARD ERRORS IN MOMENTS 131
of odds in favor of or against deviations less than a multi-
ple of E is again dependent on the assumption that the
^th moments /x? fpund from repeated trials form a
normal distribution.
44. Standard error of the ^th moment ixq about a
mean. In considering the problem of the standard error
of a moment about a mean, it is important to recognize
the difference between the mean of the population and a
mean obtained from a sample.
For simplicity, we shall consider the problem of the
standard error in a gth moment about the mean of the
population when we take samples of ^ variates as in § 43.
The mean of the population is a fixed point about which
we take the qih. moment of each sample of s variates.
Then if we follow the usual plan of dropping the primes
from the ju's to denote moments about a mean, we write
from (14)
for the square of the standard error of /x, in terms of
moments of the theoretical distribution.
In particular, we have for the standard error of the
second moment
o-J2=(m4— M2^)A .
When the distribution is normal,
M4 = 3^2^ , and a\^ — 2fjL2^/s •
Since o- = (jU2)^S we have
«<. = : ^"^
nearly.
132 RANDOM SAMPLING FLUCTUATIONS
Square each member, sum for all samples, and divide
by the number of samples. This gives
4(7^ 4sa^ 2s
7^ = S = ^ or .. = ./(2.)V2
Hence, the probable error in approximating to the
standard deviation a of the population by the standard
deviation from a sample of s variates is given approxi-
mately by
.6745(7. = .6745(7/(25)1/2.
To avoid misunderstanding, it should perhaps be em-
phasized that we have throughout this section restricted
our discussion to the gth moment about the mean of the
population. The problem of dealing with the standard
error in the qih moment about the mean of a sample
offers additional difficulties because such a mean varies
from sample to sample. A problem arises from the cor-
relation of errors in the means and in the corresponding
moments. Further problems arise in considering the close-
ness of certain approximations, especially when the mo-
ments are of fairly high order, that is, when q is large.
We shall simply state without demonstration that the
square of the standard error in the ^th moment about
the mean of a sample is given by
s
as a first approximation. For ^=2, this expression be-
comes (iJL4 — fx->^)/s. For ^ = 4, it becomes (/xg — Ai4^)A '^^ the
case of a normal distribution. These expressions for the
special cases q = 2 and g = 4 are the same as for the mo-
ments about a fixed point.
REMARKS ON STANDARD ERRORS 133
45. Remarks on the standard errors of various statis-
tical constants. We have shown a method of derivation
of the standard errors in certain statistical constants (the
mean, the ^th moment about a fixed point), and in partic-
ular the derivation of probable error of the mean. Our
main purpose has been to indicate briefly the nature of
the assumptions involved in the derivation of the most
common probable-error formulas. The next step would
very naturally consist in finding the correlations of errors
in two moments. Following this, we could deal with the
general problem of standard errors in parameters of fre-
quency functions of one variable on the assumption that
the parameters may be expressed in terms of moment
coefiicients. Thus, let
y=Ax, ci, C2, )
be any frequency curve, where any parameter
Ci = <i>{x, i[i2, M8, .... ,M«, • ... )
is a function of the mean and of moments about the mean.
Suppose that this relation is such that we may express
bCi in terms of bx, 5/x2, 5/>i3, . . . . , at least approximately
by differentiation of the function </>. If we then square
6q, sum, and divide by the number of samples, we obtain
an approximation to the square of the standard error in q.
While, in a general way, this method may be described
as a straightforward procedure, the derivation of useful
formulas is likely to involve rather laborious algebraic de-
tails. Moreover, considerable difficulty may arise in esti-
mating the errors involved in the approximate results.
The difficulties of estimating the magnitude of the
134 RANDOM SAMPLING FLUCTUATIONS
errors involved are likely to be much increased when the
statistical constant, for example, a correlation coefficient,
is a function not merely of the moments of the separate
variables, but also of the product moments of two vari-
ables.
In concluding these remarks on standard errors of sta-
tistical parameters obtained from moments of observa-
tions, it may be of interest to' point out that the character-
ization of the sampling fluctuations in such parameters
may be extended and refined by the use of higher-order
moments of the errors in the parameters. B. H. Camp has
shown that the use of moments of order higher than two
may very naturally be accompanied by the use of a cer-
tain number of terms of Gram-Charlier series as a dis-
tribution function.^®
46. Standard error of the median. Thus far in our
discussion of standard errors and probable errors, we have
assumed that the statistical constants or characteristics
of the frequency function are given as functions of the
moments. There are, however, useful characteristics such
as a median, a quartile, a decile, and a percentile of a
distribution which were not ordinarily given as fimctions
of moments. Such a characteristic number used in the
description of a distribution is ordinarily calculated from
its definition, which specifies that its value is such that a
certain fractional part of the total frequency is on either
side of the value in question. For example, a median m
of a given distribution is ordinarily calculated from the
definition that variates above and below m are to be
equally frequent. Similarly, a fourth decile D4 is calcu-
lated from the definition that four-tenths of the frequency
is to be below Da. We are thus concerned with the sam-
STANDARD ERROR OF MEDIAN 135
pling fluctuations of the bounds of the interval which in-
cludes an assigned proportion of the frequency.
To illustrate further, let us consider the standard error
in the median m of samples of iV of a variable x distributed
in accord with a continuous law of frequency given by
y—f{x). We assume that there exists a certain ideal medi-
an value M of the population of which we have a sample
of N and that by definition of the median 1/2 is then the
probability that a variate taken at random falls above (or
below) M. We may then write that in any sample of
N variates taken at random from the indefinitely large
set, the number above M is N/2-\-d. That is, the median
m of the sample is at a distance 8x = dm from M. When
y has a value corresponding to a value of ai; in the inter-
val 8m, we may write
ydm = d
to within infinitesimals of higher order.
Such an equation connects the change 8m in the me-
dian of the sample from the theoretical M with the sam-
pling deviation d of the frequency above M. Then
6m == - and <^m = -oOd •
But, from (1), page 119,
ffd = Npq = — . Hence <t« = -^ — .
If we have a normal distribution
136 RANDOM SAMPLING FLUCTUATIONS
the value of y at the median is given by
-TT^ =. 39894-
and the standard error in the median found from ranks is
1.2533(r
(15)
<r« =
« ^t/2
Although the theoretical values of the median and of
the arithmetic mean are equal in a normal distribution,
the median found from a sample by ranking has a sam-
pling error 1.2533 times as large as the arithmetic mean
obtained as a first moment from the same sample.
47. Standard deviation of the sum of independent
variables. In sampling problems, it is often found useful
to know the expected value of the square of the standard
deviation of the sum F = Xi-[-X2+ • • • • -\-Xi of s mu-
tually independent variables when we have given the
standard deviations <ri, 0-2, . . . . , o", of each variable in
the population to which it belongs.
Assuming that the given deviations are measured from
the theoretical or expected values for the populations, we
consider deviations x, = X/ — £(X,), and write the devia-
tion of the sum
y = a:i-h^2+ • • • • -\-Xs .
Square both sides, sum for the number of samples
iV^, and divide by N . Then we have
STANDARD ERROR OF SUM 137
If we pass to expected values, and let o-J, (ri, . . . . , (Ti
denote the squares of standard deviations of the several
variables and <rj that of their sum in the populations,
we have
(16) al^ai+ol+ ""+oi,
the product terms vanishing by V, page 117.
It is a matter of some interest to note how the ex-
pected value just found differs from the expected value
of the sum of squares of the s deviations of Xi, X2, . . , , x,
from their mean
i=l
obtained from a sample. If we let
1 *
(17) x'i=Xi--Y,Xi,
we are to find E{xi^-{-X2^-\- • • • • +x'/), in terms of
E(x]) = <r^i (f = 1,2, .... , s). From (17) we may write
,_5-l Xi X,
X\ = X\ ....
s s s '
f 5—1 Xi X,
s s s
Then for i^j we have
42^ .... +0:? = ^ {x,'+ . . • . +x,')-^Y.^iXi.
5 5
138 RANDOM SAMPLING FLUCTUATIONS
Hence, passing to expected values, using V, page 117,
(18) EW+ .... +^i^)=^ (^!+ • • • • +oX) .
48. Remarks on recent progress with sampling errors
of certain averages obtained from small samples. In the
development of the theory of sampling, the assumption
has usually been made that the sample contains a large
number of individuals, thus leading to the expectation
that the replacement of probabilities by corresponding
relative frequencies will give a valuable approximation.
But the lower bound of large numbers has remained poor-
ly defined in this connection. For example, certain prob-
able-error formulas have been applied to as few as ten
observations.
Beginning with a paper by Student^' in 1908 there
have been important experimental and theoretical results
obtained on the distribution of arithmetic means, stand-
ard deviations, and correlation coefiBicients obtained from
small samples.
In 1915, Karl Pearson^^ took an important step in ad-
vance by obtaining the curve
(19) y^y^x'^^e"^'
for the distribution of the standard deviations of samples
of n variates from an infinite population distributed in
accord with the normal curve.
By finding the moments /X2, M3, and jLt4 of this theoreti-
cal distribution, and then tabulating the corresponding
SAMPLING ERRORS OF SMALL SAMPLES 139
and the skewness of the curve (19) for integral values of
n from 4 to 100, and making use of the fact that i3i = 0,
ft = 3, and sk (skewness) =0 are necessary conditions for
a normal distribution, Pearson shows experimentally that
the distribution of standard deviations given by (19) ap-
proaches practically a normal distribution as n increases.
In this experiment, the necessary conditions A = 0, ft = 3,
and sk — are assumed to be sufficient for practical ap-
proach to a normal distribution.
From this table of values, Pearson concludes that for
samples of 50 the usual theory of probable error of the
standard deviation holds satisfactorily, and that to apply
it to samples of 25 would not lead to any error of impor-
tance in the majority of statistical problems. On the other
hand, if a small sample, n<20 say, of a population be
taken, the value of the standard deviation found from
the sample tends to be less than the standard deviation
of the population.
In a paper published in 1915, R. A. Fisher^^ dealt with
the frequency distribution of the correlation coefficient r
derived from samples of n pairs each taken at random
from an infinite population distributed in accord with the
normal correlation surface (p. 104), where p is the cor-
relation coefficient. The frequency function %=/„(r)
given by Fisher for the distribution of r was such that
the investigation of its approach to a normal curve as n
increases seemed to require special methods for comput-
ing the ordinates and moments. Such special methods
were given in a joint memoir^° by H. E. Soper, A. W.
Young, B. M. Cave, A. Lee, and Karl Pearson. The val-
ues of jSi and ft were computed for these distributions to
study the approach to the normal curve.
140 RANDOM SAMPLING FLUCTUATIONS
With respect to the approach of these distributions
to the normal form with increasing values of w, it is found
that the necessary conditions /3i = 0, p2 = S for a normal
distribution are not well fulfilled for samples of 25 or
even 50, whatever the value of p. For samples of 100, the
approach to the conditions iSi = 0, ft = 3 is fair for low
values of p, but for large values of p, say p>.5, there is
considerable deviation of /3i from 0, and of ft from 3. For
samples of 400, on the whole, the approach to the neces-
sary conditions /3i = 0, ft = 3 is close, but there is quite a
sensible deviation from normality when p^ .8. These re-
sults give us a striking warning of the dangers in inter-
preting the ordinary formula for the probable error of r
when we have small samples.
As to the limitations on the generality of these results,
it should be remembered that the assumption is made, in
this theory of the distribution of r from small samples,
that we have drawn samples fi-om an infinite population
well described by a normal correlation surface, so that the
conclusions are not in the strictest sense applicable to
distributions not normally distributed. While the results
just now described have thrown much light on the dis-
tributions of statistical constants calculated from small
samples, it is fairly obvious that much remains to be done
on this important problem.
49. The recent generalizations of the Bienayme-
Tchebycheff criterion. Although the use of probable errors
for judging of the general order of magnitude of the nu-
merical values of sampling deviations is a great aid to com-
mon-sense judgment; it must surely be granted that we
are much hampered in drawing certain inferences depend-
ing on probable errors because of the hmitation that the
interpretation of the probable error of a statistical con-
TCHEBYCHEFF INEQUALITY 141
stant is to some extent dependent in any particular case
on the normality of the distribution of such constants
obtained from samples, and because of the lack of knowl-
edge as to the nature of the distribution.
Any theory that would deal effectively with the prob-
lem of finding a criterion for judging of the magnitude of
sampling errors with little or no limitation on the nature
of the distribution would be a most welcome contribution,
especially if the theory could be made of value in dealing
with actual statistical data. The Biena3ane-Tchebycheff
criterion (p. 29) may be regarded as an important step
in the direction of developing such a theory. We have in
the Tchebycheff inequality a theorem specifying an upper
bound 1/X^ for the probability that a datum taken at
random will be equal to or greater than X times the stand-
ard deviation without limitation on the nature of the
distribution. That is, if P(X(7) is the probability that a
datum drawn at random from the entire distribution will
differ in absolute value from the mean of all values as
much as X<7, then
(20) PM^^,.
To establish a first generalization of this inequality
(cf. p. 29), let us consider a variable x which takes mutual-
ly exclusive values Xi^ x^, . . . . , Xn with corresponding
probabiUties, pi, />2, . . , pny where pi-\-p2+ . . +/>»=!.
Let a be any number from which we wish to measure
deviations. For the expected values of the moment of
order 2s about a, we may write
M2. = />l^^ + M^+ +pn(Pn' ,
where di = Xi — a.
142 RANDOM SAMPLING FLUCl'UATIONS
"Letd^d^'j . . . . , be those deviations it:, — fl which are
numerically as large as an assigned multiple \a (X>1) of
the root-mean-square deviation, and let p\ p'\ . . . . , be
the corresponding probabilities. Then we have
ti2s^p'd'^'-^p"d"^'+ .
Since d\ d" , . . . . , are each numerically as large as
Xo", we have
mJ-^X'V^(/''+P"+ • • • • ) .
If we let P(\(t) be the probability that a value of x
taken at random will differ from a numerically by as much
as X(7, then'P(Xcr) =/?'+/?''+ . . . . , and
Then
PM^.2s.2s>
ll2s
and the probability of obtaining a deviation numerically
less than Xcr is greater than
J M2S
X2V2* •
This generalization of the Tchebycheff inequality is
due to Karl Pearson^* except that he assumed a distribu
tion given by a continuous function with a as the mean
x-coordinate of the centroid of frequency area. For this
case, we should merely drop the prime from /x^, and write
(21) P(X(t)^ ^^
X^V^^
GENERALIZED TCHEBYCHEFF INEQUALITY 143
With 5= 1, we obviously have the Tchebycheff inequality
as a special case.
It is Pearson's view that, although his inequality is in
most cases a closer inequality than that of Tchebycheff,
it is usually not close enough to an equality to be of
practical assistance in drawing conclusions from statisti-
cal data. On the whole, Pearson expresses not only dis-
appointment at the results of the Tchebycheff inequality,
but holds that his own generalization still lacks, in gen-
eral, the degree of approximation which would make the
result of real value in important statistical applications.
Hence, it is an important problem to obtain closer in-
equalities. The problem of closer inequalities has been
dealt with in recent papers by several mathematicians.®^
Camp, Guldberg, Meidel, and Narumi have succeeded
particularly well by placing certain mild restrictions on
the nature of the distribution function F{x). The restric-
tions are of such a nature as to leave the distribution func-
tion sufficiently general to be useful in the actual prob-
lems of statistics. The main restriction placed on F{x)
by Camp is that it is to be a monotonic decreasing func-
tion of I re I when |:r|^C(7, c^O. The general ejffect of this
restriction is to exclude distributions which are not rep-
resented by decreasing functions of | :r | at points more
than a certain assigned distance from the origin. We shall
now present the main results of Camp without proof.
With the origin so chosen that zero is at the mean, he
reaches the generalized inequality
2^
0*
m fMS^'^TTJ^+TT-/"
144 RANDOM SAMPLING FLUCTUATIONS
where
/32«-2=-^ and
(c_ 2s Y
(2.+ l)g-l)
When c = 0, the formula (22) is Pearson^s formula (21)
divided by (1 + 1/25)2*.
The general effect of the work of Camp and Meidel
has been to decrease the larger number of the Pearson
inequality (21) by roughly 50 per cent. These generaliza-
tions seem to have both theoretical and practical value
when we have regard for the fact that the results apply
to almost any type of distribution that occurs in practical
applications. Indeed, it is so satisfying to have only very
mild restrictions on the nature of the distribution in judg-
ing sampling errors that further progress in extending the
cautious limits of sampling fluctuations given by the
generalizations of the Tchebycheff inequality would be
of fundamental value.
50. Remarks on the sampling fluctuations of an ob-
served frequency distribution from the underljdng theo-
retical distribution. If we have fitted a theoretical fre-
quency curve to an observed distribution, or if we know
the theoretical frequencies from a priori considerations,
the question often arises as to the closeness of fit of theory
and observation. In considering this question, a criterion
is needed to assist common-sense judgment in testing
whether the theoretical curve or distribution fits the ob-
served distribution well or not. It is beyond the scope of
the present monograph to deal with the theory underly-
ing such a criterion, but it seems desirable to remark that
the fundamental paper on this important problem of
REMARKS 145
random sampling was contributed by Karl Pearson under
the title, "On the criterion that a given system of devia-
tions from the probable in the case of a correlated system
of variables is such that it can be reasonably supposed
to have arisen from random sampling," Philosophical
Magazine, Volume 50, Series 5 (1900), pages 157-75.
Closely related to the problem of the closeness of fit
of theory and observation is the fundamental problem of
establishing a criterion for measuring the probability that
two independent distributions of frequency are really
random samples of the same population. Pearson pub-
lished one solution of this problem in Biometrika,
Volume 8 (1911-12), pages 250-54. The resulting crite-
rion represents an important achievement of mathemati-
cal statistics as an aid to common-sense judgment in con-
sidering the circumstances surrounding the origin of a
random sample of data.
CHAPTER VI
THE LEXIS THEORY
51. Introduction. We have throughout Chapter II as-
sumed a constant probability underlying the frequency
ratios obtained from observation. It is fairly obvious that
frequency ratios are often found from material in which
the underlying probability is not constant. Then the sta-
tistician should make use of all available knowledge of the
material for appropriate classification into subsets for
analysis and comparison. It thus becomes important to
consider a set of observations which may be broken into
subsets for examination and comparison as to whether
the underlying probability seems to be constant from sub-
set to subset. In the separation of a large number of rela-
tive frequencies into n subsets according to some appro-
priate principle of classification, it is useful to make the
classification so that the theory of Lexis may be applied.
In the theory of Lexis we consider three types of series
or distributions characterized by the following properties:
1. The underlying probability p may remain a con-
stant throughout the whole field of observation. Such a
series is called a Bernoulli series, and has been considered
in Chapter 11.
2. Suppose next that the probability of an event
varies from trial to trial within a set of s trials, but that
the several probabilities for one set of s trials are identical
to those of every other of n sets of s trials. Then the
series is called a Poisson series.
3. When the probability of an event is constant from
146
INTRODUCTION 147
trial to trial within a set but varies from set to set, the
series is called a Lexis series.
The theory of Lexis^^ uses these three types as norms
for comparison of the dispersions of series which arise in
practical problems of statistics. An estimate of the im-
portance of this theory may probably be formed from the
facts that Charlier^ states in his Vorlesungen iiber mathe-
matischen Statistik (1920) that it is the first essential step
forward in mathematical statistics since the days of La-
place, and that J. M. Keynes®^ expressed a somewhat
similar opinion in his Treatise on Probability (1920).
These may be somewhat extreme views when we recall
the contributions of Poisson, Gauss, Bravais, and Tcheby-
cheff but they at least throw light on the outstanding
character of the contribution of Lexis to the theory of
dispersion. The characteristic feature of the method of
Lexis is that it encourages the analysis of the material
by breaking up the whole series into a set of sub-series for
examination of the fluctuation of the frequency among
the various sub-series. Such a plan of analysis surely has
the sanction of common-sense judgment.
In drawing s balls one at a time with replacements
from an urn of such constitution that p is the constant
probability that a ball to be drawn will be white, we have
already established the following results for Bernoulli
series:
L The mathematical expectation of the number of
white balls is sp (p. 26).
2. The standard deviation of the theoretical distribu-
tion of frequencies is {spqY'^ (p. 27).
3. The standard deviation of the corresponding dis-
tribution of relative frequencies is (pq/sY^^ (p. 27).
148 THE LEXIS THEORY
52. Poisson series. To develop the theory of the Pois
son series let 5 urns,
Uu C/2, U.,
contain white and black balls in such relative numbers
that
Ph /'2, . . . . , />•
are the probabilities corresponding to the respective urns
that a ball to be drawn will be white. Let
(1) ^^h+P.+ ■■■■+P.
From (1) it follows that the mathematical expectation sp
of white balls in a set of s obtained one from each urn
is exactly equal to the mathematical expectation of white
balls in drawing s balls with a constant probability p of
success. The standard deviation dp of the theoretical dis-
tribution of the number of white balls per set of 5 is re-
lated to the standard deviation (TB = {spqy^^ of a hypo-
thetical Bernoulli distribution with a constant probability
p of success, by the equation
X=S J=5
(2) <T'p = spq-^{p,-py = cr%-'y^(p.-p)\
where p is equal to the mean value of pi, pzy . . . . , ps.
To prove this we start with (1) and recall that sp is the
arithmetic mean of the number of white balls in any set
of s under the theoretical distribution.
POISSON SERIES 149
Let us consider next the standard deviation <t of white
balls in the theoretical series of s balls. The square of
the standard deviation of the frequency of white balls in
drawing a single ball with the chance pt that it will be
white is given by (T\ = ptqti that is, by making ^=1 in
sptqt.
When the probabilities pi, p2, . . . . y ps are inde-
pendent of one another, it follows from (16), page 137,
that
where <ti, (T2, . . . . , (t, are the standard deviations of
white balls in drawing one ball from each urn correspond-
ing to probabilities pi, p2, . . . . , ps, respectively, and a
is the standard deviation of white balls among the s balls
together drawn one from each urn.
Hence, we have
t=s
(3) a^ = piqi+p2q2+ ' ' ' +M«= ^/'t?* •
t=i
But
Hence
and
Pt = p+(pt-p) , qt = q-(pt-p)
Ptqt = pq- {pt -P){p-q)-ipt- pY
(4) ^Piqt = spq-^{Pt-py y since ^{pt-p)='0 .
t=\ i=i
Hence, we have established (2), from which it follows at
once that the standard deviation of a Poisson series is less
I50 THE LEXIS THEORY
than that of the corresponding Bernoulli series with con-
stant, probability of success equal to the arithmetic mean
of the variable probabilities of success.
To give an illustration of a Poisson series, conceive of
n populated districts. Each district is to consist of s sub-
divisions for which the probability of death at a given age
varies from one subdivision to another, but in which the
series of s probabilities are identical from district to dis-
trict. To illustrate further this type of distribution, con-
struct an urn schema consisting of 10 urns each of which
contains 15 balls, and in which the number of white balls
in the respective urns is 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. The
arithmetic mean of the probabilities of drawing a white
ball is 1/2. A set of 10 is obtained by drawing one ball
from each urn. Then each ball is returned to the urn
from which it was drawn, and a second set of 10 is drawn.
This process is continued until we have 1,000 sets of 10.
The resulting frequency distribution of the number of
white balls is a Poison distribution.
53. Lexis series. To give a statistical illustration of a
Lexis series, conceive of n populated districts in each of
which the probability of death is constant for men of
given age, but is variable from district to district.
To develop the theory of the Lexis distribution we
draw s balls one at a time from an urn U\ with a constant
probability pi of getting a white ball, from U2 with a con-
stant probability /?2, . . . . , from £/„ with a constant
probability />„.
The mathematical expectation of white balls in thus
drawing ns balls is spi-\-sp2-\- • • • • +spn = nsp, where
p^(\/n)(pi+p2-{- • • • • +pn) is the arithmetic mean of
the probabilities pi, p2y . . . . , />„.
LEXIS SERIES 151
Since nsp is the mathematical expectation of white
balls in samples of ns balls, the mathematical expectation
in samples of 5 balls one at a time from a random urn is
sp. This value sp is identical to the mathematical ex-
pectation of white balls in samples of 5 balls of a Bernoulli
series with a constant probability p.
Since p t is the probability that a ball to be drawn from
urn Ut will be white, the expected value of the square of
the standard deviation of the number of white balls in
samples of s drawn from Ut is sptQi- In other words,
sptqt is the mean square of the deviations of white balls
from spt in samples of 5 drawn from Ut. If the deviations
were measured from sp instead of spty it follows from the
theorem (p. 21) for changing the origin or axis of second
moments that the mean square of the deviations would be
(5) sptqt+{spt-spy .
Suppose this mean value of the squares of deviations
were obtained from N samples of 5 each. Then
(6) Nsptqt^Ns\Pt-py
would be the expected value of the sum of squares of the
deviations from sp in the A^ samples of s drawn from Ut.
By adding together the expression (6) for / = !, 2,
. . . . , w, we have
(7) f^s^p,q,^Ns'^{Pt-py
for the expected value of the sum of squares of the devia-
tions from sp for the n urns. In obtaining (7), we have
t=l (=1
t=n
152 THE LEXIS THEORY
drawn in all Nn sets of s balls of which N sets are from
each um.
The mean-square deviation from sp of the number of
white balls in samples of s thus taken from the n urns
Uiy U2, . . . . y Unis then obtained by dividing (7) by the
number of sets Nn, This gives
From (4) above
t=
^Ptqt = npq-^{pt-py ,
t=i t=i
and hence
It should be observed from (8) that the standard devia-
tion of a Lexis distribution is greater than that of a
Bernoulli distribution based on a constant probability p
which is equal to the mean value of the given probabili-
ties pi, />2,. . • • J Pn-
54. The Lexis ratio. Let c' be the standard deviation
of a series of relative frequencies obtained by experiment
from statistical data. On the hypothesis of a Bernoulli
distribution the theoretical value of the standard devia-
tion is (TB = (pq/sy^^ where p is the probability of success
in any single trial. The ratio
Mj — / —
LEXIS RATIO
153
is called the Lexis ratio, where a = S(t' and o-^ = s<tb. When
L — \,^ the series of relative frequencies is said to have
normal dispersion. When L< 1, the series is said to have
subnormal dispersion. When L>\, the series is said to
have supernormal dispersion. Illustrative applications
of the Lexis ratio to statistical data are readily available.®^
From the nature of the Lexis theory it is fairly obvi-
ous, as implied in the introduction to this chapter, that
TABLE I
State
Births^
Deaths per 1,000
California ....
65,457
33,471
66,544
40,477
62,941
57,185
61,348
48,535
61,352
66
Connecticut
72
Indiana
Kansas
Kentucky
70
61
58
Minnesota . ...
58
North Carolina
66
Virginia
68
Wisconsin
72
Arithmetic mean
55,257
65.7
the application of the theory to particular statistical data
involves breaking up the aggregate into a number of sub-
sets according to some appropriate scheme of classifica-
tion which would ordinarily depend on much knowledge
of the material which is the subject of the investigation.
Then we are concerned not only with a frequency ratio
for the entire aggregate, but also with the stability of
frequency ratios among the subsets. The dispersion of
frequency ratios is calculated and compared with the ex-
pected value in the case of a Bernoulli distribution.
As an example, let us consider the dispersion of death-
rates of white infants under one year of age in registration
154 THE LEXIS THEORY
states^ of the United States in which the number of
births per year of white children is between 33,000 and
67,000 (see Table I). This restriction is placed on the
selection of states so that the number of instances per set
has only a moderate amount of variabihty.
In most of the practical problems of statistics the
exact values of the underlying probabilities are unknown
and the best substitutes available are the approximate
values of the probabilities given by available relative fre-
quencies. Substituting these frequency ratios as approxi-
mations for p and g, we find the Bernoulli standard
deviation from the formula cFB^ipq/sY^'^. We then com-
pare (t'b with the standard deviation obtained directly
from the data. The simple arithmetic mean of the death-
rates is 65.7 per 1,000, and their standard deviation (with-
out weighting) is 5.21 per 1,000. If these infantile death-
rates constituted a Bernoulli distribution with a number
of instances equal to the average number of births, 55,257
in each case, w^e should have
, _ (pq\ 1/2 _ [- (.0657)(.9343) 1 1/2
^^"Vt; ~l 55:257 J
= .00105 per person, = 1.05 per 1,000 .
Hence, the Lexis ratio is
^-ei-"
Hence the dispersion is supernormal, and we have
trong support for the inference that there is a significant
variation in infant mortality from one of these states to
APPLICATIONS 15s
another. The full interpretation of this fact would re-
quire much knowledge of the sources of the material.
A reasonable plan for the determination of the maxi-
mum district over which the infantile death-rates are
essentially constant seems to involve breaking the aggre-
gate of instances into subsets in a variety of ways and
then testing results as above. Some measure of doubt will
remain, but this procedure encourages the kind of analysis
that gives strong support to induction.
CHAPTER VII
A DEVELOPMENT OF THE GRAM-
CHARLIER SERIES
55. Introduction. In § 56 we shall attempt to show (cf.
p. 65) that a certain line of development^^ of the binomial
distribution suggests the use of the Gram-Charlier Type
A series as a natural extension of the De Moivre-Laplace
approximation and the Type B series as a natural exten-
sion of the Poisson exponential approximation considered
in Chapter II* Then in §§ 57-58 we shall develop meth-
ods for the determination of the parameters in terms of
moments of the observed frequency distribution, thus
deriving certain results stated without proof in § 19
and §21.
56. On a development of Type A and Type B from the
law of repeated trials. As in the De Moivre-Laplace the-
ory, we consider the probability that in a sample of 5 indi-
viduals, taken at random from an unlimited supply, r
individuals will have a certain attribute. That is, the
probability we wish to represent is given by
and we shall use a function of the form
(1) Bo(x) = ~^re{w)e''-^dw,
156
LAW OF REPEATED TRIALS 157
for interpolation between the values 5(r), where i^=—l
and
r=s
(2) e(w) = ipe-^+qy = ^B{r)^ .
In the terminology of Laplace, d(w) is the generating func-
tion of the sequence B{r).
We shall first show that Bo(x)=B{m) when ic is a
positive integer m. To prove this, substitute d(w) from
(2) in (i) and integrate. This gives
sin (r—x)ir
f=0 ^ ^
{s — x)v
When x==w is a positive integer, each term but one of
the right member vanishes and this one has the value
B {m) . Accordingly, 5o(w) = B {m) .
Thus formula (1) gives exactly the terms of the expan-
sion of {p-\-q)* for positive integral values x=m. It may
be considered an interpolation formula for values of x
between the integral values.
We shall be interested in two developments of this
interpolation formula. The first is based on the develop-
ment of log 0{w) in powers of w, and the second on the
development in powers of p. The resulting types of de-
velopment are known as the Type A and Type B series,
respectively.
158 DEVELOPMENT OF GRAM-CHARLIER SERIES
DEVELOPMENT OF TYPE A
From the form of d(w) in (2), we have
Develop the right-hand member of (3) in powers of
w and we obtain
Thus we have by integration, remembering that ^(0) = 1 ,
log ^(tt;)=5 pwi-k-j^ pqiwiy-j^ pqip-qK^ifiy-i ,
or writing
ft. 6,
(4) e{w)^e '• ^'
we have
(5) bi = sp , b2=spq , ^>3= -spqip-q),
We now write
(6) 0(w) == e*.««--*.«'./2 [1 _ AiiwiY+ A^iwiY
Since it follows from (2) that 9{w) is an entire func-
tion of w, the series in brackets in the right member of
(6) converges since it is the quotient of an entire function
6(w) by an exponential factor with no singularities in the
finite part of the plane.
TYPE A SERIES 159
From (4), (5), and (6) we have
(7) Ai = ^spq{p-q) , A4 = ^spq{l-6pq) , .
Inserting 6{w) from (6) in (1), we have
(8) ^oW=^ I ^we-t*-^)«*-^'^/2[l-^3M^
+A,{wiy ] .
If we write
(9) ^{x)=^ j (/z£;e-(*-*x)«---^«'/2 ,
we have from (8),
(10) Boix)=m+A,^^+Aj^+..-..
If, however, 62 is not small, we may use in place of
Q (jt:)the function (})(x) defined as
*(^)=C
00
00
by changing the limits of integration from ± t to ± 00
Moreover, we shall prove that
To prove this, we write
<t>{x) = 2^ e-*««''/2 cos [w(x-bi)]dw
2V-oo
i6o DEVELOPMENT OF GRAM-CHARLIER SERIES
The second term vanishes because the sine is an odd
function. Since the cosine is an even function, we may
write
(12) 0W = - I ** e-*»«"^^ cos lw{x-bi)]dw .
Differentiation with regard to x gives
^^= -- Pe-^'^V2 ^ sin [w{x-bi)]dw .
ax TT^o
Integrate the right-hand member by parts and we have
ax 02V Jq
__ (x-bi) , .
= ^ — 4>{,x) .
Then by integration,
(13) <f>{x)^Ae■'<"^^>*^^t.
To find A, let a; = ^i in (12) and (13). This gives the well
known definite integral
A=- I '°c-*a«^/2j^^. 1
""Jo ^ ' •*"'"(27rW^^^
Hence, we have
(2irb,y
^W=7^^Z^^-''-*^^'/^-
as given in (11).
senes
TYPE A SERIES l6i
Therefore we may write in place of (10) the Type A
es
(14) S.(.) = ^(.)+^,^+^,^+.....
where
if (72 = 62.
To study the degree of approximation secured in
changing the limits of integration from ± tt to ±00 in
passing from Q(x) to <f>(x)y we observe that
<t>ix)-n(x)=^ C"^ dwe-'^^y^ cos [{x-h)]
and hence
[<t>{x)-n{x)]<- \dwe-'*^/^ = ~ r'°g-^V2 ^
if X2 = (r2w;8.
Hence, the difference approaches zero very rapidly
with increasing values of <r as may be seen by using the
values of the last integral written corresponding to values
of X=l, 2, 3, 4, . . . . , in a table of this probability
integral. A similar examination for the derivatives of Q
and will show that their differences similarly approach
zero.
DE\'ELOPMENT OP TYPE B
To develop (2) in powers of />, we first write, since
' d]oge(w) ^ ispe^
(15) -j dw i-^(i-e«.)
i62 DEVELOPMENT OF GRAM-CHARLIER SERIES
a convergent series since
\p{l-e-^)\<l.
Since ^(0) = 1, we obtain by integration
(16) \ogeiw) =
Hence, writing
(17) e(w)^e-'f''-'''Hl-\-B2(l-e'^y+Bsil-e'^y+ J ,
we have
Now, from (1) and (17),
(18) Bo{x)=:^ \ dwe-'^-'^^'-''^^[i+B2{l-e'^y
+B^{l-e^y+-"].
Let
^(3c) = ^ I dwe-'^-'P^'-''^^ = j Q(2e;, x)dw .
Then let
^^(:r)=:^(:r)-^(x~l) = ^ C' dwe-^-'^^'"'"^^
— L j '^^g-(*-i)t«-*p(i-««^)
= j {l-e'^)Q{Wy x)dw .
TYPE B SERIES 163
Then ^^{x) = ) ' (1 - e'^YQiw, x)dw ,
AV'W= j {\-e'^yQ{w,x)dw ,
Hence, we have
(19) B,{x) =^W+52AVW+^3A'^ W-f .....
To give other forms to
iHx)=~- I e-«^-*Mi->) dw ,
we may write
^^^^ ^^) e-'«-+^^«""' dw
= ^ j '^ e-^l 1+5/) e^+^-^e'^+ ]dw
(20) =gIl^r!HL^4-,^^^"(^-^)^4-....
gr L ic ^ x—1
(spy sin (x-r)T
"^ r! x-r "^ ' *
c-*^ . [1 sp , s^i^
= sm irx\ ^+ ^. / /^x — • • • •
i64 DEVELOPMENT OF GRAM-CHARLIER SERIES
sin TTxl 1 X X^
x~x^l'^(x-2)2r
(21) =e-
(-^y\r
]■
' (x-r)r\
if sp is replaced by X.
The foregoing analytical processes can be easily justi-
fied by the use of the properties of uniformly convergent
series. When x approaches an integer r, it is easily seen
from (20) that each term approaches zero except the
term
e~'^ {spy sin {x — r)ir
V r\ x — r *
and this term has as its limit the Poisson exponential
r! f! '
The formula (21) may therefore be regarded as defining
the Poisson exponential e~'^\'/xl for non-integral values
of x.
The development in series (19) is useful only when p
is so small that sp is not large, say sp ^ 10, s being a large
number. In this case, sp is likely to be too small to allow
an expansion in a Type A series. Otherwise, the develop-
ment in Type A is better suited to represent the terms of
the binomial series.
While the above demonstration is limited to the rep-
resentation of the law of probability given by terms of a
binomial, Wicksell has gone much further in the paper
COEFFICIENTS OF TYPE A SERIES 165
cited above in showing a line of development which sug
gests tne use of the Gram-Charher series for the represen-
tation of the law of probability given by terms of the
hypergeometric series, thus representing the law of prob-
abihty which gives the basis of the Pearson system of
generalized frequency curves. Unfortunately, the demon-
stration of this extension would require somewhat more
space devoted to formal analysis than seems desirable in
the present monograph. Hence we merely state the above
fact without a demonstration.
57. The values of the coeflacients of the Type A series
obtained from the biorthogonal property. If in (14) we
measure x from the centroid as an origin and in units
equal to the standard deviation, c, we may write in place
of (14)
(22) F(:*:)=0(:c)-ffl30(3)(^)_^^^^(4)(^)_^ ....
where
and <^^''\x) is the wth derivative of <^(x) with respect to x.
The coefficients an(w = 0, 3, 4, . . . .) in the Type A
series may be easily expressed in terms of moments of area
under the given frequency curve about the centroidal
ordinate because the functions </)^"^(a:) and the Hermite
polynomials EJ^x) defined by the equation
i66 DEVELOPMENT OF GRAM-CHARLIER SERIES
form a biorthogonal system. Thus,
(23) f^ <f>^^^x)nm(x)dx = (m4=»),
X
(24) and I <t>^lHm{x)dx = ^^-^^ (w=«) ,
and this biorthogonal property affords a simple method of
determining the coefficients in the Type A series.
To prove (23) and (24) we may write
(25)
f^ <t>^«\x)Hm{x)dx=^{-l)- f^ <l>{x)Hn{x)HUx)dx
=:(_l)m+n r^^i^)(x)n^{x)dx .
Integration by parts gives
n </><«) (X) Hmix)dx = [0(»-l) ix) Hm{x)^
- C"<i>^--'\x)HUx)dx=- C^ <i>^--'Kx)E'm{x)dx .
J-oo J-oo
Continuing until we have performed w4- 1 successive inte-
grations by parts, we obtain, assuming n>m,
C^4>^»^{x)H„Xx)dx^
(-!)«+! r°° (i>(»-^-^\x)nJ*^-^'Hx)dx ,
COEFFICIENTS OF TYPE A SERIES 167
where W^'^^^x) is the (w+l)th derivative of Hmix).
Since Hm{x) is a pol3Tioinial of degree miaXy its (w4- l)th
derivative vanishes and we have
(26) ( <l>^^\x)nM)dx =
C^ 0(«)i
for n>m. But from the fonn of (25), it is obvious that
we could equally well prove (26) for m>n. For w = w, we
proceed as above with m successive integrations. We
then have, if we replace m by n,
r°° 0^«J (x) Hn{x)dx = (- 1)« r "^ 0(«-«) {x) En^*^ {x)dx
J—oa J-co
But the nth derivative E^^Kx) of the polynomial En(x)
is equal to n !. Hence,
(27)
j <^(«>(x)H„(ic)Jic=(-l)''w! I <i>{x)dx
J-co J-co
By multiplying both members of (22) by En(x) and
integrating under the assumption that the series is uni-
formly convergent, we have
C^ F{x)En{x)dx^anC^ <i>^rO(y^)B^{x)dx^{--\Y^
1 68 DEVELOPMENT OF GRAM-CHARLIER SERIES
since by application of (26) all terms of the right-hand
member vanish except the. one with the coefficient ««•
Hence,
a(-iy p F(x)nn{x)dx
(28) .,= ^, .
Moreover, to determine an numerically for an observed
frequency distribution we replace F(x) in (28) by the
observed frequency function /(jc).
For purposes of numerical application, let us now
change back from the standard deviation as a unit
to measuring x in the ordinary unit of measurement
(feet, pounds, etc.) involved in the problem, but still keep
the origin at the centroid. This means that we replace
X in (28) by x/<t. If in these units /(ic) gives the observed
frequency distribution, we may write in place of (28)
(29)
an^<J ^^- 1 m Hn{x/a) dx/a
I =^^f"fix)Hn{x/a)dx.
Since En{x/a) is a polynomial of degree w in a;, the co-
efficients an are thus given in terms of moments of area
under the observed frequency curve. It is then fairly
obvious that the determination of the moments of area
under the frequency curve plays an important part in the
Gram-Charlier system as well as in the Pearson system.
58. The values of the coefficients of Type A series
obtained from a least-squares criterion. It may be proved
by following J. P. Gram that the value of any coefficient
an obtained in § 57 by the use of the biorthogonal property
COEFFICIENTS OF TYPE A SERIES i6q
is the same as that obtained by finding the best approxi-
mation to f{x), in the sense of a certain least-squares
criterion, by the first m terms of the series (m^n). To
prove this statement, we may proceed as follows: Con-
sider the series
F(x) = ao(f>{x)+ai(j>^'^{x)+ +a^<t>^^\x)
for the representation of an observed frequency function
f{x). The least-squares criterion^^ that
(30) V=r^^[f{x)-F{xWdx
shall be a minimum leads, to values of coefficients given
in § 19.
To prove this, we square the binomial f(x)—F{x) and
differentiate partially with regard to the parameters
do, ^3, . . . . , a«. This gives
dv_ d r-^ f{x)F{x)dx d r-f^, .,, 1 ,
=2(-l)«+' f^fix) Hn{x)dx-\-2an C^[Hr.{x)f <i>{x)dx
J -Vi J — CO
smce
QO
= rjallHo(x)Y+a\[H,ix)Y-{- ....
+ aUnm{xm<t>{x)dx ,
the product terms vanishing because of (26).
ryo DEVELOPMENT OF GRAM-CHARLIER SERIES
Making dV/dan = 0, we have
(31) 2(-l)»+^ r j{x) Hnix)dx+2an f" lBnix)Y 4>{x)dx =
J — oo J — ca
But
(32) r°° [Hr.{x)f4>{x)dx={- \Y (^ 4>^»^(x)Hn{x)dx^^ *
From (31) and (32), we have
- [^ fix) Hn{x)dX ,
J-co
which is identical with the value obtained by the use of
the biorthogonal property.
59. The coefficients of a Type B series. In consider-
ing the determination of the coefficients Cq, Ci, C2, . . . . ,
of the Type B series, we shall restrict our treatment to a
distribution of equally distant ordinates at non-negative
integral values of x^ and shall for simplicity consider the
representation by the first three terms of the series. That
is, we write
Fix) = Coiix)+CiArHx) +C2AV W ,
where
iix)=--r
for ar = 0, 1, 2, . . . . Let fix) give the ordinates of the
observed distribution of relative frequencies, so that
E/(*) = i.
Equating sums of ordinates and first and second
moments /x (and ^2 of ordinates of the theoretical and
COEFFICIENTS OF TYPE B SERIES 171
observed distributions, we may now determine the co-
efficients approximately from the equations :
' E lcoHx)+c,^^|^ix)+c,^'^p{x)] = Zf{x)=-l
(33) I E^ [coHx)-hciAHx)+C2A'Hx)]==T.xfix)=n\
. T.AcoHx)+c,AHx)+C2A'rP{x)] = Z^y(:^) = mJ
Before solving these equations for Co, Ci, and C2, we
may simplify by the substitution of certain values which
are close approximations when we are dealing with large
numbers. Thus, we recall that we have derived in § 14,
Chapter II, the following approximations:
M2=E^vw=x+v.
We may next easily obtain the following further ap-
proximate values:
^xAxKx)^i:x[^f^ix)-4^(x-l)]
= X-X-1--1 .
Similarly, it is easily shown that
Y.xA'iix) = 0,
E^A^(;c) = -2X-l ,
172 DEVELOPMENT OF GRAM-CHARLIER SERIES
and
Substituting these values in equations (33) ^ we obtain
Co = 1 , Xco — Ci = /ii ,
(X+X2)co-(2X+l)ci+2c2=M2.
If we take X = /xi, we have the coefficient Ci=0.
Then expressing the second moment ni in terms of
the second moment fi2 about the mean by the relation
we have
X+X2+2C2 = M2+X2 , C2 = §(ai2-X) .
Hence, we write
F(a;)=^W+i(M2-X)AVW,
when X is taken equal to the first moment mi, which is
the arithmetic mean of the values of the given variates.
It is fairly obvious that this application of moments
to finding values of the coefficients can be extended to
more terms if they were needed in dealing with actual
data.
NOTES
1. Page 1. fimile Borel, Elements de la theorie des probabilitis, p. 167.
Le hasard, p. 154.
2. Page 23. Julian L. Coolidge, An introduction to mathematical
probability (1925), pp. 13-32.
3. Page 30. Tchebycheff, Des valeurs moyennes. Journal de Math6-
matique (2), Vol. 12 (1867), pp. 177-84.
4. Page 30. M. Bienaym6, Considerations a I'appui de la decouverie
de Laplace sur la loi de prubabilite dans la methode des moindres carris
Comptes Rendus, Vol. 37 (1853), pp. 309-24.
5. Page 31. E. L. Dodd, The greatest and the least variate under
general laws of error, Transactions of the American Mathematical Society,
Vol. 25 (1923), pp. 525-39.
6. Pages 31 and 47. Some writers call this theorem the "Bernoulli
theorem" and others call it the "Laplace theorem." It has been shown
recently by Karl Pearson that most of the credit for the theorem should
go to De Moivre rather than to BemouUi. For this reason we call the
theorem the "De Moivre-Laplace theorem" rather than the "Bernoulli-
Laplace theorem." See Historical note on the origin of the normal curve of
errors^ Biometrika, Vol. 16 (1924), p. 402; also James Bernoulli's theorem,
Biometrika, Vol. 17 (1925), p. 201.
7. Page 32. For a proof see Coolidge, An introduction to mathematical
probability, pp. 38-42.
8. Pages 37 and 68. James W. Glover, Tables of applied mathematics
(1923), pp. 392-411.
9. Page 37. Karl Pearson, Tables for statisticians and biometricians
(1914), pp. 2-9.
10. Page 39. Poisson, Recherches sur la probability des jugemenis,
Paris, 1837, pp. 205 ff.
11. Page 39. Bortkiewicz, Das Gesetz der kleinen Zahlen, Leipzig,
1898.
12. Page 48. For various proofs of the normal law, see David Brunt
The combination of observations (1917), pp. 11-24; also Czuber Beobach-
tungsfehler (1891), pp. 48-110.
13. Page 50. Karl Pearson, Mathematical contributions lo the theory
of evolution, Philosophical Transactions, A, Vol. 186 (1895). pp. 343-414.
173
174 NOTES
14. Page 50. Karl Pearson, Supplement to a memoir on skew variation
Philosophical Transactions, A, Vol. 197 (1901), pp. 443-56.
15. Page 50. Karl Pearson, Second supplement to a memoir on skew
variation, Philosophical Transactions, A, Vol. 216 (1916), pp. 429-57.
16. Page 60. J. P. Gram, Om Raekkeudviklinger (1879) (Doctor's
dissertation), Copenhagen, 1879; also Uber die Entwickelung reeller Func-
tionen in Reihen mittelst die MHhode der kleinsten Quadrate, Journal fur
Mathematik, Vol. 94 (1883), pp. 41-73.
17. Page 60. T.N.Tlnele, Almindelig I agttagelseslaere, CopenhAgen,
1889; cf. Thiele, Theory of observations, 1903.
18. Page 60. F. Y. Edge worth, The asymetrical probability-curve,
Philosophical Magazine, Vol. 41 (1896), pp. 90-99; also The law of error,
Cambridge Philosophical Transactions, Vol. 20 (1904), pp. 36-65, 113-41.
19. Page 60. G. T. Fechner, Kollektivmasslehre (ed., G. R. Lipps),
1897.
20. Page 60. H. Bruns, Uber die Darstellung von Fehlergesetzen,
Astronomische Nachrichten, Vol. 143, No. 3429 (1897); also Wahrschein-
lichkeitsrechnung und Kollektivmasslehre, 1906.
21. Page 60. C. V. L. Charlier, Uber das Fehlergesetz, Arkiv for
Matematik, Astronomi och Fysik, Vol. 2, No. 8 (1905), pp. 1-9; also
Uber die Darstellung "wUlkurlicher Funktionen, Arkiv for Matematik
Astronomi och Fysik, Vol. 2, No. 20 (1905), pp. 1-35.
22. Page 60. V. Romano vsky. Generalization of some types of fre-
quency curves of Professor Pearson, Biometrika, Vol. 16 (1924), pp. 106-17.
23. Page 63. Wera Myller-Lebedeff, Die Theorie der Intcgralglcich-
ungen in Anwendung auf einige Reihenentwicklungen, Mathematische
Annalen, Vol. 64 (1907), pp. 388-416.
24. Pages 65 and 156. S. D. Wicksell, Contributions to the analytical
theory of sampling, Arkiv for Matematik, Astronomi och Fysik, Vol. 17,
No. 19 (1923), pp. 1-46.
25. Pages 67 and 169. In the use of the least-squares criterion that
V in (16), §20, and in (30), §58, shall be a minimum, a question natur-
ally arises as to the propriety of weighting squares of deviations with
the reciprocal l/<f>{x) = {2iry/^e^^/'^ of the normal function. Gram used
this weighting without commenting on its propriety so far as the writer
has been able to learn. One fairly obvious point in support of the
weighting is its algebraic convenience.
26. Page 68. N. R. Jorgensen, Undersogelser over Frequensflader og
Korrelation (1916), pp. 178-93.
27. Page 74. H. L, Rietz, Frequency distributions obtained by certain
transformations of normally distributed variates. Annals of Mathematics,
Vol. 23 (1922), pp. 292-300.
NOTES 175
28. Page 74. S. D. Wicksiell, On the genetic theory of frequency, Arkiv
for Matematik, Astronomi och Fysik, Vol. 12, No. 20 (1917), pp. 1-56.
29. Page 75. E. L. Dodd, The frequency law of a function of variables
with given frequency laws, Annals of Mathematics, Ser. 2, Vol. 27, No. 1
(1925), pp. 12-20.
30. Page 75. S. Bernstein, Sur les courbes de distribution des proba-
biliies, Mathematische Zeitschrift, Vol. 24 (1925), pp. 199-211.
31. Page 81. Francis Galton, Proceedings of the Royal Society, Vol.
40 (1886), Appendix by J. D. Hamilton Dickson, p. 63.
32. Page 81. Karl Pearson, Mathematical contributions to the theory
of evolution III, Philosophical Transactions, A, Vol. 187 (1896), pp. 253-
318.
33. Page 81. G. Udny Yule, On the significance of the Bravais's
formulae for regression, etc., Proceedings of the Royal Society, Vol. 60
(1897), pp. 477-89.
34. Page 84. E. V. Huntington, Mathematics and statistics, American
Mathematical Monthly, Vol. 26 (1919), p. 424.
35. Page 86. H. L. Rietz, On functional relations for which the co-
ejicient of correlation is zero, Quarterly Publications of the American
Statistical Association, Vol. 16 (1919), pp. 472-76.
36. Page 91. Karl Pearson, On a correction to be made to the correla-
tion ratio, Biometrika, Vol. 8 (1911-12), pp. 254-56; see also Student,
The correction to be made to the correlation ratio for grouping, Biometrika,
Vol. 9 (1913), pp. 316-20.
37. Page 92. Karl Pearson, On a general method of determining tlie
successive terms in a skew regression line, Biometrika, Vol. 13 (1920-21)
pp. 296-300.
38. Page 100. Maxime B6cher, Introduction to higher algebra (1912),
p. 33.
39. Page 101. L. Isserlis, On the partial correlation ratio, Biometrika
Vol. 10 (1914^15), pp. 391-411.
40. Page 101. Kari Pearson, 0» /Ae par/*a/ c<?rre/a/jfm ro/w, Proceed-
ings of the Royal Society, A, Vol. 91 (1914-15), pp. 492-98.
41. Page 102. H. L. Rietz, Urn schemata as a basis for the developmcn
of correlation theory, Annals of Mathematics, Vol. 21 (1920), pp. 306-22
42. Page 103. A. A. Tschupiovt ,Grundbegriffe und Grundprobleme
der Korrelationstheorie (1925).
43. Page 109. H. L. Rietz,, On the theory of correlation with special
reference to certain significant loci on the plane of distribution Annals of
Mathematics (Second Series), Vol. 13 (1912), pp. 195-96.
44. Pages HI and 112. Karl Pearson, On the theory of contingency
176 NOTES
and its relation to association and normal correlation^ Drapers' Company
Research Memoirs (Biometric Series I), (1904), p. 10.
45. Page 111. E. Czuber, Theorie der Beobachtungsfehler (1891), pp.
355-82.
46. Page HI. James McMahon, Hyperspherical goniometry; and its
application to correlation theory for N variables, Biometrika, Vol. 15 (1923),
pp. 192-208; paper edited by F. W. Owens after the death of Professor
McMahon.
47. Page 112. Karl Pearson, On the correlation of characters not
quantitatively measurable, Philosophical Transactions, A, Vol. 195 (1900),
pp. 1-47.
48. Page 112. Karl Fearson, On further methods of determining cor-
relation, Drapers' Company Research Memoirs (Biometric Series IV)
(1907), pp. 10-18.
49. Page 112. Warren M. Persons, Correlation of time series (Hand-
book of Mathematical Statistics [1924], pp. 150-65).
50. Page 112. C. Gini, Nuovi contributi alia teoria delle relazioni
statistiche, Atti del R. Istituto Veneto di S.L.A., Tome 74, P. II (1914^15).
51. Page 112. Louis Bachelier, Calcul des probabilites (1912), chaps.
17 and 18.
52. Page 113. Seimatsu Narumi, On the general forms of bivariate
frequency distributions which are mathematically possible when regression
and variation are subjected to limiting conditions, Biometrika, Vol. 15
(1923), pp. 77-88, 209-21.
53. Page 113. Karl Pearson, Notes on skew frequency surfaces, Bio-
metrika, Vol. 15 (1923), pp. 222-44.
54. Page 113. Burton H. Camp, Mutually consistent multiple re-
gression surfaces, Biometrika, Vol. 17 (1925), pp. 443-58.
55. Page 126. 0. Bohlmann, Formulierung und begriindung zweier
Hilfssatze der mathematischen Statistik, Mathcmatische Annalen, Vol. 74
(1913), pp. 341-409.
56. Page 134. Burton H. Camp, Problems in sampling. Journal of
the American Statistical Association, Vol. 18 (1923), pp. 964r-77.
57. Page 138. Student, The probable error of a mean, Biometrika,
Vol. 6 (1908-9), pp. 1-25.
58 Page 138. Karl Pearson, On the distribution of the standard de-
viations of small samples: Appendix I. To papers by "Student" and R. A.
Fisher, Biometrika, Vol. 10 (1914-15), pp. 522-29.
59. Page 139. R. A. Fisher, Frequency distribution of the values of the
correlation coefficient in samples from an indefinitely large population,
Biometrika, Vol. 10 (1914-15), pp. 50.7-21.
NOTES 177
60. Page 139. H. E. Soper and Others, On the distribution of the
correlation coefficient in small samples. Appendix II to the papers of "Stu-
denr and R.A. Fisher, Biometrika, Vol. 11 (1915-17), pp. 328-413.
61. Page 142. Karl Pearson, On generalised Tchebychejff theorems in
the mathematical theory of statistics, Biometrika, Vol. 12 (1918-19), pp
284-96.
62. Page 143. M. Alf. Guldberg, Sur le theoreme de M. Tchehychef,
Comptes Rendus, Vol. 175 (1922), p. 418; also Sur quelques inegalites dans
le calcul des probabilites, Vol. 175 (1922), p. 1382. M. Birger Meidell, Sur
un probleme du calcul des probabilites et les statistiques mathemaliques,
Comptes Rendus, Vol. 175 (1922) , p. 806; also Sur la probabilite des erreurs,
Comptes Rendus, Vol. 176 (1923), p. 280. B. H, Camp, A new generaliza-
tion of Tchebycheff's statistical inequality, Bulletin of the American Mathe-
matical Society, Vol. 28 (1922), pp. 427-32. Seiniatsu Narumi, On further
inequalities with possible applications to problems in the theory of probability,
Biometrika, Vol. 15 (1923), p. 245.
63. Page 147. W. Lexis, Vber die Thcorie der Sfubilitdt slatisticher
Reihen, Jahrbuch fiir National Ok. u. Statistik, Vol. 32 (1879), pp. 60-98.
Abhandlungen zur theorie der bevolkerungs und moral- statistik, Kap. V-IX
(1903).
64. Page 147. C. V. L. Chariier, Vorlesungen iiber die Grundziige der
matltematischen Statistik (1920), p. 5.
65. Page 147. J. M. Keynes, A treatise on probability (1921), p. 393.
66. Page 153. In this connection, the expression "L^ I" meanc
"Ls 1 apart from chance fluctuations."
67. Page 153. Handbook of Mathematical Statistics (1924), pp. 88-91
C. V. L. Chariier, Vorlesungen iiber die Grundziige der mathematischen
Statistik (1920), pp. 38-42.
68. Page 154. Birth statistics for the registration area of the United
Stales (1921), p. 37.
INDEX
(Numbers refer to pages)
Arithmetic mean and mathemati-
cal expectation, 14-16
Bachelier, 112
Bernoulli, 2; distribution, 23;
theorem of, 27-31; series, 146
Bernstein, 75
Bertrand, 109
Bielfeld, J. F. von, 2
Bienayme-Tchebycheflf criterion,
28-29; generalization of, 140-44
Binomial distributions, 22-27, 51
Bdcher, 175
Bohlmann, 126
Borel, 1
Bortkiev/icz, 39
Bravais, 3
Bruns, 49, 60
Brunt, 173
Camp, 113, 134, 143, 144
Carver, 75
Cattell and Brimhall, 38
Charlier, 2, 49, 60-67, 156-77
Coefficient of alienation, 87
Coolidge, 23, 173
Correlation, 77-113; meaning of,
77-78; regression method, 78-
103; correlation surface method,
79, 104-11, correlation coeffi-
cient, 82; linear regression, 84;
non-linear regression, 88; corre-
lation ratio, 88-91; multiple, 92-
102; partial, 98-101; standard
deviation of arrays — standard
error of estimate, 87-90, 95;
multiple correlation coefficient,
97; partial correlation coeffi-
cient, 98-99; multiple correla-
tion ratio, 101; normal correla-
tion surfaces, 104-11; of errors,
122-24
Czuber, 173
De Moivre, 2, 3
De Moivre-Laplace theory, 31-38,
43-45, 156
Deviation: quartile, 38; standard,
27,47
Dickson, J. Hamilton, 81
Discrepancy, 26; relative, 27
Dispersion: normal, 3, 153; sub-
normal, 3, 153; supernormal, 4,
153; measures of, 14, 27, 153
Dodd, 31, 75
Edgeworth, 2, 49, 60, 73
Elderton, 53, 60
Ellipse of maximum probability,
109
Error; see Probable error and
Standard error
Euler, 3
Excess, 71-72
Fechner, 49, 60
Fisher, R. A., 139
Frequency, relative, 6
Frequency curves: defined, 13;
normal, 34, 47; generalized, 48-
76
Frequency distribution, observed
and theoretical, 12-14
Frequency functions: defined, 13;
of one variable, 46-76; normal,
34, 47; generalized, 4S-76
179
i8o
INDEX
Galton, 81-82
Gauss, 3, 47
Generating function, 60, 75, 76,
157
Gini, 112
Glover, tables of applied mathe-
matics, 37, 68
Gram, 2, 49, 60
Gram-Charlier series, 60, 61, 65,
72, 75-76; development of, 156-
77; coefficients of, 65-68, 165-70
Guldberg, 143
Hermite poljTiomials, 66, 75-76,
165
Heteroscedastic system, 88
Homoscedastic system, 88
Huntington, 175
Hypergeometric series, 52, 165
Isserlis, 101
Jacobi polynomials, 75
Jorgensen, 68
Laguerre polynomials, 76
Laplace, 2, 3; see De Moivre-La-
place theory
Lexis, 2, 3; theory, 146-55; series,
146, 150; ratio, 152
Maclaurin, 3
McMahon, 111
Mathematical expectation, 14-16,
116-17; of the power of a varia-
ble, 18-21; of successes, 26
Median, 134; standard error in,
135
Meidel, 143-44
Mode and most probable value
17-18, 25
Moments: defined, 18; about an
arbitrary origin, 18; about the
arithmetic mean, 19; applied to
Pearson's system of frequency
curves, 58; coefficients of Gram-
Charlier series in terms of, 66,
168, 170
Most probable number of suc-
cesses, 25
Multiply correlation coefficient, 97
Myller-Lebedeff, 174
Narumi, 113, 143
Normal correlation surfaces, 104-
11
Normal frequency curve, 34, 47,
50; generalized, 60, 61, 65-69,
156-61
Partial correlation coefficient, 98-
99
Pearson, 2, 47, 49; generalized fre-
quency curves, 50-60, 75, 81, 92,
101, 111-13, 138, 142, 144^5,
175
Pearson's system of frequency
curves, 50-60, 75
Persons, 176
Poisson, 3; exponential function,
39-45, 61, 164; series, 148
Population, 4
Probability: meaning of, 6-11; a
priori, 10; a posteriori, 10; statis-
tical, 10
Probable eUipse, 108
Probable error, 39, 127-30, 132
Quartile deviation, 39
Quetelet, 3
Random sampling fluctuations,
114-45
Regression: curve defined, 79; lin-
ear, 84, 93; non-linear, 88-91,
101; surface defined, 93; method
of, 78-103
Relative frequency and probabil-
ity, 6-11
Rietz, 174, 175
Romanovsky, 60, 75
INDEX
i8i
Scedastic curve, 87
Sheppard, 37
Simple sampling, 22-44, 114^36
Skewness, 68-71
Small samples, 138-40
Soper, Young, Cave, Lee, and
Pearson, 139
Standard deviation, 27, 82; of
random sampling fluctuations,
116; of sum of independent vari-
ables, 136; see Standard error
Standard error: defined, 119; in
class frequencies, 119-21; in
arithmetic mean, 126-27; in ^th
moment, 130-32; in median,
134; in averages from small
samples, 138-40
Stirling, 3, 32
Student, 138
Tchebycheff, 2, 28-30, 140-44
Thiele, 49, 60
Tschuprow, 102
Whittaker, Lucy, 43
Whittaker and Robinson, 64
Wicksell, 65, 74, 164
Yule, 81
iH US A J
BUi§^ifiy^^oN
1
i
i
GAYLORD
PH(NTEOINU.S.A.
QA 273.R5
3 9358 00236269 4
QA
273
R5
236269
Rietz- Henrv Lewis
(d
•H^
JC (0
O
•H
+»
• (d
I QQ 6
LC 0)
^ 0) (d
CO (d 0)
^ CO-H
<d
Cv •
Q? <di5
a e :3
0)01
-£
-H (d •
•H F-i
« 1-*
COO
(< •
(d
o c
0) »
H a
^1^ (d
u
CO (d
^ a-H
(d 000 «l
•'(»
••HXI
(fl :) (d
o
CM
SO
CO
n
<N
s
0^
CQ
05
o
I
a
m